juliema / label_reconciliations

Code for reconciling multiple transcriptions for a label
MIT License
26 stars 11 forks source link

workflow_id dtype confusion #37

Closed PmasonFF closed 6 years ago

PmasonFF commented 6 years ago

reconcile.py requires dtype for workflow_id to be int, but nfn.py line 22 empties the dataframe for given input data file data\notes-from-nature-classifications.8.25.16.csv unless modified as follows: from df = remove_rows_not_in_workflow(df, workflow_id) to df = remove_rows_not_in_workflow(df, str(workflow_id))

rafelafrance commented 6 years ago

The change is in. But for my own education, what data caused this? And what version of Python are you using?

PmasonFF commented 6 years ago

Data - I was trying to get reconcile.py working with the nfn classification file in the repository data folder - "notes-from-nature-classifications.8.25.16.csv"

As far as I can tell the dtype for that file's work_id is string. Note my wording of the issue indicates the problem could be that file, but are not all the zooniverse classification download fields strings?

Version of python - 3.6.2 BUT the environment is patched together since I am on Windows and pip install -r requirements.txt bombed! I installed Scipy from http://www.lfd.uci.edu/~gohlke/ since there is no windows binaries from Scipy I THINK I got everything else I need (I get no missing module errors) but I am not really happy with the environment.

I am also not sure that the patch I suggest is the best fix - it takes care of not emptying the dataframe when none of the string data match the int work_id but there may be other uses that need work-id to be string when it is int. As far as I can tell everything else is working ( ie the summary gets the correct title, and the reconciliation seems fine.

rafelafrance commented 6 years ago

Thank you. The patch is fine for our needs. I can clean the old data in the repo.

FYI: I use Linux so I can't offer advice on Windows environments. However, the one module that may trip you up is "python-Levenshtein". You do not have to have it to run the reconciler but if it is missing it will slow things to a crawl.