goru001 / inltk

Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need
https://inltk.readthedocs.io
MIT License
824 stars 163 forks source link

Suppress warnings in console :SourceChangeWarnings #69

Closed jayaBalaR closed 3 years ago

jayaBalaR commented 3 years ago

Environment :- Google Colaboratory Runtime :- CPU Installed inltk as directed in the documentation for iNLTK

!pip3 install torch==1.3.1+cpu -f https://download.pytorch.org/whl/torch_stable.html

!pip3 install inltk

We are using the function get_similar_sentences() for our case study. We are running this function for our dataset .Dataset size = 9k split in chunks of 1000 records. image The code logs warning to the console for each records slowing down the similar sentence generation and taking more than 4 hours for just 1k records. image

We have tried disabling warning using python constructs , but it does not work Also tried to install latest version of torch and run the code , still facing same issue. Could you please help disable these warnings or provide a fix for the same. Thank you Jayabalambika.R

goru001 commented 3 years ago

Thanks @jayaBalaR for reaching out. It won't be right to suppress warnings in the library itself. If you want you can hide all warnings by running this at the start of your notebook (from here):

` import warnings

warnings.filterwarnings('ignore') `

Additionally, iNLTK depends on torch v1.3 so updating won't help.

About your original concern of speed, the function currently does take time to run on CPU, and since each one of your sentences is being run sequentially, this much time is expected I guess and these warnings don't have anything to do with it. If you really want to speed this up, you can run this on GPU by modifying the iNLTK code (the library doesn't yet support GPU integration unfortunately, and if you feel like, feel free to raise a PR as well for the GPU support.)

Or else, worst case what you can do is segment your data in 10 parts, and run 10 different notebooks which will generate augmentations for each one of the 10 data parts, given you have enough CPU power at your disposal.

Let me know if I can help in any way.

jayaBalaR commented 3 years ago

@goru001 , Thank you very much for your prompt response. Actually , we have tried to suppress the warnings by using the code earlier before raising this ticket ,as it did not suppress them.

Now, 1) we have actually segmented my total data_set = 9k into 10 parts . I am running only 1 part of data 1k records, its taking 4 hours + on CPU as per earlier conversation. Also after trying with CPU on colab we switched the runtime to GPU and still it does take the same 4+hours and now I am able to relate that , though we switch to GPU, internally it runs its CPU implementation only, hence the time.

2)Could you please let us know what would be the approximate time to provide the GPU support, in order for us to estimate and plan the parts in our case study. 3) Also please let us know whether a separate PRF needs to be raised for GPU support , or you would link the same for GPU support.

Once again thank you very much.

goru001 commented 3 years ago

@jayaBalaR I can't really give an ETA on GPU support as my hands are full at least for next 2-3 months to be able to work on that. I think you should be fine if you run 10 separate notebooks each one with 1k datapoints, and you should be able to get augmentations for 10k datapoints in ~4 hours, given enough CPU power.

Additionally, there's already this issue for GPU support so it should suffice.

jayaBalaR commented 3 years ago

@goru001 , thank you, noted . We will get back in case of any queries on the same issue.

goru001 commented 3 years ago

Closing this for now since there's already this issue for GPU support, feel free to reopen if there's anything else.