Closed lizgzil closed 4 months ago
In gje_formatting.py we have a step to convert all single quotes to double - which is needed in the GJE.
gje_formatting.py
The code is
for col_name in [ "top_5_socs", "top_5_green_skills", "top_5_not_green_skills", "top_5_sics", "top_5_itl2_quotient", "top_5_similar_occs", ]: occ_agg_extra_loaded[col_name] = occ_agg_extra_loaded[col_name].str.replace( "'", '"' )
but this causes issues when there is a single quote within the text, so we get e.g. "Manufacture of other builders" carpentry and joinery".
Note: this isnt as simple as only replacing at the start and end because
top_5_socs looks something like the following (and is a string):
"['Gardening', 'Manufacture of other builders' carpentry and joinery', 'something else', 'something else', 'something else']"
In
gje_formatting.py
we have a step to convert all single quotes to double - which is needed in the GJE.The code is
but this causes issues when there is a single quote within the text, so we get e.g. "Manufacture of other builders" carpentry and joinery".
Note: this isnt as simple as only replacing at the start and end because
top_5_socs looks something like the following (and is a string):