Open affromero opened 8 months ago
I apologize. I modified the path at that time because some users needed to download the data offline, and they were unsure about where to place the files. So, I changed it to the home directory to make it easier for users to locate and modify the path.
I was just wondering in the case of installing the project as a package. Of course I can always modify the cached file, but doing it everytime there is a new version of mmdet
seems nor right.
@affromero Yes, do you have a more reasonable solution in mind?
Just from debugging, if I comment those two lines, then nltk.pos_tag(tokens)
is going to look through all these folders and if it does not find the files, it raises an error:
LookupError
- '/home/<user>/nltk_data'
- '<virtualenv>/nltk_data'
- '<virtualenv>/share/nltk_data'
- '<virtualenv>/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
So as long as we have the data in any of those, I think we can safely use nltk.download('punkt')
and nltk.download('averaged_perceptron_tagger')
, so no download_dir
is needed?
@affromero It may be necessary to clearly document the cache path for new users who may not know where it is located.
All in all this issue is definitely a minor issue, but as I was a new user who had to debug the whole thing to understand it better, maybe some other folks can benefit from that documentation.
To me ' ~ ' does not get resolved to home, but creates a directory called ' ~ ' inside the current working folder.
Furthermore, those two lines of code trigger a print at every inference step, I believe they they should be moved outside the first function, together with the import, in line 15 same file? To me it works smoothly like that.
This PR #11298 changed the download directory for nltk, and at least for me it creates a weird
\~/nltk_data
directory on the project folder (nltk@3.8.1
), and not the/home/user/nltk_data
that I think is intended here. The other thing is that if I previously downloaded them in another location, these two lines are going to ignore it either way. Is there a reason to force the download directory here? https://github.com/open-mmlab/mmdetection/blob/e5f9f3538fe8f33966befba196fd1c4fab710a6f/mmdet/models/detectors/glip.py#L30-L31