orchid-initiative / synthetic-database-project

MIT License
4 stars 2 forks source link

Fixed Width Conversion Review #48

Closed iqtang closed 1 year ago

TravisHaussler commented 1 year ago

Seems like a reasonable start - it's possible you may end up with a dictionary that contains touple values or even a dictionary of dictionaries (or even a pandas dataframe if that's easier?) to store additional information about each field and how it fits into the fixed width result. One point of caution is I believe dictionaries are not ordered (check me on this though)

TravisHaussler commented 1 year ago

I wonder if it might be easier for you to create a list of dictionaries to use for the whole project in the class definition (instead of the existing self.final_fields)

fields = [ {'name': 'Type of Care', 'length': 1, 'data_type': str, 'justification': left}, {'name': 'Facility Identification Number', 'length': 6, 'data_type': str, 'justification': left}, ... {name: 'procedures':, 'length': 375, 'data_type': list, 'justification': left}, ... ] fieldsinfo = pd.DataFrame.from_dict(fields)

Then you can reference the fields by their qualities from the dataframe as you go about your fixed width assembly logic.

You would just need to update the existing csv logic line " self.output_df[self.final_fields].to_csv( f'{output_loc}/formatteddata{datetime.strftime("%d-%m-%Y%H%M%S")}.csv', index=False) " To use the correct column from fieldsinfo (i.e. column 1)

TravisHaussler commented 1 year ago

BTW - Also, I would focus on formatting and assembling the fixed width line of text all at once rather than trying to format the fields within "self.output_df"

I think you want your function here just reading the data from your dataframe and using it to create its fixed width text rather than changing the data within self.output_df. I know I left either possibility open when we talked yesterday, but after thinking more I think this is the cleaner way to do things

TravisHaussler commented 1 year ago

This latest approach seems good to me, but I'm a little out of my depth just reading the code as opposed to trying it myself and googling the errors, etc. I think your well on your way to solving though