nltk / nltk_data

NLTK Data
1.47k stars 1.05k forks source link

Move the pickles to a special collection #219

Open ekaf opened 3 months ago

ekaf commented 3 months ago

Now that alternative data packages are available for all the pickles, the question arises: what to do with the old packages?

Simply removing them seems very unsafe for those users who are stuck with an old NLTK version which they cannot upgrade, because they would be forced to look elsewhere to get those packages from dubious sources.

So, what about moving them to a special collection, named for ex. "Pickles"?

hteeyeoh commented 2 months ago

means old nltk data will renamed as something else and latest one we can remain using punkt to download instead using punkt_tab? cause we saw some other modules underlying using old nltk lib. if they didnt move forward, they will still using old nltk version that download the data via punkt instead punkt_tab

ekaf commented 2 months ago

@hteeyeoh , the collections are xml files that provide thematic lists of nltk packages. I am proposing to move the pickles to a new list, while keeping their current https address, so that nothing breaks.

hteeyeoh commented 2 months ago

i see. So this means that for modules that did not upgrade nltk version they can still use punkt lib without triggering the security scan?

ekaf commented 2 months ago

Yes, @hteeyeoh, one purpose of this issue is to discuss how to handle the case when users cannot upgrade to a newer NLTK version.

hteeyeoh commented 2 months ago

Thanks. May I know when can we have this ready?

stevenbird commented 2 months ago

@hteeyeoh this is not a time-critical issue, so no promises. I suggest you use whichever punkt package you need.

hteeyeoh commented 2 months ago

Hi @stevenbird , Ya understand that. Thanks