Closed pantelis closed 10 months ago
Closing this issue after going through some posts in the HF community eg https://discuss.huggingface.co/t/trojan-in-common-voice-dataset/18155 that indicate that indeed they are false positives as the files may contain the source code of viruses or some string that matches a signature of a virus.
Information
The question or comment is about chapter:
Question or comment
In https://huggingface.co/datasets/transformersbook/codeparrot there is one file that is flagged as unsafe with information "Virus: Legacy.Trojan.Agent-37025:. Is this verified by anyone else or its a false positive of the virus scanners used by HF ? In any case does nayone know of / used any alternative dataset ?