open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.02k stars 9.37k forks source link

Issue with NLTK and MM-GDINO? #11362

Open affromero opened 8 months ago

affromero commented 8 months ago

This PR #11298 changed the download directory for nltk, and at least for me it creates a weird \~/nltk_data directory on the project folder (nltk@3.8.1), and not the /home/user/nltk_data that I think is intended here. The other thing is that if I previously downloaded them in another location, these two lines are going to ignore it either way. Is there a reason to force the download directory here? https://github.com/open-mmlab/mmdetection/blob/e5f9f3538fe8f33966befba196fd1c4fab710a6f/mmdet/models/detectors/glip.py#L30-L31

hhaAndroid commented 8 months ago

I apologize. I modified the path at that time because some users needed to download the data offline, and they were unsure about where to place the files. So, I changed it to the home directory to make it easier for users to locate and modify the path.

affromero commented 8 months ago

I was just wondering in the case of installing the project as a package. Of course I can always modify the cached file, but doing it everytime there is a new version of mmdet seems nor right.

hhaAndroid commented 8 months ago

@affromero Yes, do you have a more reasonable solution in mind?

affromero commented 8 months ago

Just from debugging, if I comment those two lines, then nltk.pos_tag(tokens) is going to look through all these folders and if it does not find the files, it raises an error:

LookupError
    - '/home/<user>/nltk_data'
    - '<virtualenv>/nltk_data'
    - '<virtualenv>/share/nltk_data'
    - '<virtualenv>/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'

So as long as we have the data in any of those, I think we can safely use nltk.download('punkt') and nltk.download('averaged_perceptron_tagger'), so no download_dir is needed?

hhaAndroid commented 7 months ago

@affromero It may be necessary to clearly document the cache path for new users who may not know where it is located.

affromero commented 7 months ago

All in all this issue is definitely a minor issue, but as I was a new user who had to debug the whole thing to understand it better, maybe some other folks can benefit from that documentation.

ricvo commented 2 months ago

To me ' ~ ' does not get resolved to home, but creates a directory called ' ~ ' inside the current working folder.

Furthermore, those two lines of code trigger a print at every inference step, I believe they they should be moved outside the first function, together with the import, in line 15 same file? To me it works smoothly like that.