Open piskvorky opened 12 years ago
@tmylk I would like to work on this. @piskvorky the link above is broken, could you brief me what it was about ?
@prakhar2b I think it was a scipy crash (segfault) when using sparse arrays and indexing an element out-of-bounds.
I wouldn't say this issues is "easy" -- it will need some careful thinking and planning. We definitely don't want to slow down processing too much, by (for example) requiring an extra data pass just to check for bad values.
I would like to work on this issue. Could you please give me some pointers to the code where to start?
@rasto2211 Adding a warning to LdaModel.init
when the input is a list is a good way to start ( item 4 above)
@piskvorky @menshikh-iv Do you also want to close this issue since you closed my PR without merging?
@rasto2211 No, because the remaining points are important (see Radim comment)
Be attentive with #1732, I already see exactly same problem twice
There's been a steady trickle of reports that LSI/LDA misbehave, produce degenerate models, crash Python etc.
Typically this is a user data problem (bad input data, feature id mismatch, ...), but since gensim targets the wide general public, this is gensim's "fault" anyway.
Create utility functions that perform basic sanity checks on user's input data: