uvacw / inca

24 stars 6 forks source link

Classification analysis class #312

Closed damian0604 closed 6 years ago

damian0604 commented 6 years ago

Hi Bob, could you please check if it works as you expected? Thanks!!

bobvdvelde commented 6 years ago

I'm seeing some problems in the pre-processing stage (unhandled errors about mismatches between x_field and label_field lengths). I'll need to dig a bit deeper to find a root cause and finish the review. I'll try and do this on the 18th of January.

damian0604 commented 6 years ago

I'm going to look into it.

damian0604 commented 6 years ago

I already see it: this is because the function retrieves the document from the DB again (see line 121), because after the loop starting in line 94, the generator is exhausted. Unless the option one-pass=True is specified (and the docs kept in memory), the documents thus are re-retrieved based on their ID, meaning that your data manipulation is gone (changing 10 cases).

I see that this behavior needs to be handled differently, but do you have a concrete suggestion on what would be the best cause of action, @bobvdvelde ?

damian0604 commented 6 years ago

(that still doesn't explain why there are exactly 32 extra observations, though)

damian0604 commented 6 years ago

Hi @bobvdvelde , see my latest commit. I think it should be solved.

theoaraujo commented 6 years ago

I ran the same tests as Bob (also with tweets, but with 18K that I had myself in a local version of inca). I'm afraid there's still an error message:

image

Let me know if I'm doing something wrong in the test, as I did not dive too deep in the documentation of the class.

damian0604 commented 6 years ago

Thanks for testing! Which version of sklearn are you using?

Verstuurd vanaf mijn iPhone

Op 9 feb. 2018 om 12:04 heeft Theo Araujo notifications@github.com<mailto:notifications@github.com> het volgende geschreven:

I ran the same tests as Bob (also with tweets, but with 18K that I had myself in a local version of inca). I'm afraid there's still an error message:

[image]https://user-images.githubusercontent.com/9536412/36024840-4772598c-0d91-11e8-9983-e08c96290687.png

Let me know if I'm doing something wrong in the test, as I did not dive too deep in the documentation of the class.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/uvacw/inca/pull/312#issuecomment-364404225, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFs8H3OZ1gM6y0Nms12fapRNR8M8vKnRks5tTCYtgaJpZM4RbaIK.

theoaraujo commented 6 years ago

Hi @damian0604

I updated scikit-learn using the new requirements file you added. Good news is that the error message now changed... :-)

Let me know if it's something wrong with the test case (I'm using the same code as Bob), or if there's something different I need to do.

Cheers, Theo

image