shashank-gv / hugging

0 stars 0 forks source link

SUMMARIZE: Could be quieter #4

Closed gjwgit closed 2 years ago

gjwgit commented 2 years ago

Ideally a mlhub command will only output the result. Currently for

$ ml summarize hugging microsoft_text.txt
/home/graham/.mlhub/hugging/.python/transformers/models/t5/tokenization_t5_fast.py:156: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-small automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
- To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value.
  warnings.warn(
Summary of microsoft_text.txt between 20 and 50 words below:

the program will support around 100 institutions with AI infrastructure, course content and curriculum . it will provide AI development tools and Azure AI services . the program is an attempt to ramp up the institutional set-up and build capabilities .

Ideally:

$ ml summarize hugging microsoft_text.txt
the program will support around 100 institutions with AI infrastructure, course content and curriculum . it will provide AI development tools and Azure AI services . the program is an attempt to ramp up the institutional set-up and build capabilities .
shashank-gv commented 2 years ago

Added a --verbose (bool) option (default : False). This along with os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'(which doesn't log the tensorflow warnings/errors) should clear up the output. The user can still see the warnings if they choose --verbose True. Package deprecation warnings (should they come up) are still shown by default.

gjwgit commented 2 years ago

Please link to the commit that fixed this: like 1dea1010528b42615f1a77787df958a85fa6f755

gjwgit commented 2 years ago

It is still printing: Summary of microsoft_text.txt between 20 and 70 words below:. Only print if --verbose

$ ml summarize hugging microsoft_text.txt 
Summary of microsoft_text.txt between 20 and 70 words below:

the program will support around 100 institutions with AI infrastructure, course content and curriculum . it will provide AI development tools and Azure AI services . the program is an attempt to ramp up the institutional set-up and build capabilities .
gjwgit commented 2 years ago

Now working just fine:

ml summarize hugging microsoft_text.txt
the program will support around 100 institutions with AI infrastructure, course content and curriculum . it will provide AI development tools and Azure AI services . the program is an attempt to ramp up the institutional set-up and build capabilities .