Closed damian0604 closed 6 years ago
I'm seeing some problems in the pre-processing stage (unhandled errors about mismatches between x_field and label_field lengths). I'll need to dig a bit deeper to find a root cause and finish the review. I'll try and do this on the 18th of January.
I'm going to look into it.
I already see it: this is because the function retrieves the document from the DB again (see line 121), because after the loop starting in line 94, the generator is exhausted. Unless the option one-pass=True is specified (and the docs kept in memory), the documents thus are re-retrieved based on their ID, meaning that your data manipulation is gone (changing 10 cases).
I see that this behavior needs to be handled differently, but do you have a concrete suggestion on what would be the best cause of action, @bobvdvelde ?
(that still doesn't explain why there are exactly 32 extra observations, though)
Hi @bobvdvelde , see my latest commit. I think it should be solved.
I ran the same tests as Bob (also with tweets, but with 18K that I had myself in a local version of inca). I'm afraid there's still an error message:
Let me know if I'm doing something wrong in the test, as I did not dive too deep in the documentation of the class.
Thanks for testing! Which version of sklearn are you using?
Verstuurd vanaf mijn iPhone
Op 9 feb. 2018 om 12:04 heeft Theo Araujo notifications@github.com<mailto:notifications@github.com> het volgende geschreven:
I ran the same tests as Bob (also with tweets, but with 18K that I had myself in a local version of inca). I'm afraid there's still an error message:
[image]https://user-images.githubusercontent.com/9536412/36024840-4772598c-0d91-11e8-9983-e08c96290687.png
Let me know if I'm doing something wrong in the test, as I did not dive too deep in the documentation of the class.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/uvacw/inca/pull/312#issuecomment-364404225, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFs8H3OZ1gM6y0Nms12fapRNR8M8vKnRks5tTCYtgaJpZM4RbaIK.
Hi @damian0604
I updated scikit-learn using the new requirements file you added. Good news is that the error message now changed... :-)
Let me know if it's something wrong with the test case (I'm using the same code as Bob), or if there's something different I need to do.
Cheers, Theo
Hi Bob, could you please check if it works as you expected? Thanks!!