mysociety / contract-countdown

https://mysociety.github.io/contract-countdown/
0 stars 0 forks source link

DtypeWarning when running import_tenders #19

Closed sagepe closed 1 year ago

sagepe commented 2 years ago

When running import_tenders, we see the following warning that may or may not need action:

/app/procurement/management/commands/import_tenders.py:83: DtypeWarning: Columns (23) have mixed types. Specify dtype option on import or set low_memory=False.
  df = pd.read_csv(self.data_file)
zarino commented 2 years ago

Looks like the proper solution here is to pass a dict of column names and types as the dtype argument to pd.read_csv() in commands/import_tenders.py.

But there are 60 columns in the CSV, so not trivial to go through and specify a type for each. Any thoughts @struan @alexander-griffen ?

alexander-griffen commented 2 years ago

Another solution to this would be to set low_memory to false. This isn't the 'proper' solution, but would avoid going through each of the 60 columns. Historically, I've used this and it hasn't broken anything, I assume it would just make the import process use more memory. Could this become a problem if we're doing frequent data updates?

alexander-griffen commented 1 year ago

Closed, as tender_import script changed, and no longer produces this warning.