google / magika

Detect file content types with deep learning
https://google.github.io/magika/
Apache License 2.0
7.7k stars 403 forks source link

Add more "basic" tests samples to cover supported content types #662

Open reyammer opened 2 weeks ago

reyammer commented 2 weeks ago

The new model "standard_v2_0" supports 200+ content types: https://github.com/google/magika/tree/main/assets/models/standard_v2_0/README.md

Ideally, we have at least one "basic sample" for each of the supported content types (See /tests_data/basic/*).

This issue acts as a call for action -- external help is very welcome!

Important aspects to keep in mind:

mamamia96 commented 2 weeks ago

I'd like to add a handful of basic tests for:

reyammer commented 2 weeks ago

These would be very welcome! As indicated in the issue, please include a description on how these files were created (especially for the binary ones, such as pickle). Examples on how we created some of the test cases: create a new google doc, then "export as" various formats. Thanks!

mamamia96 commented 2 weeks ago

Where should I include my description of how I created the files?

mamamia96 commented 2 weeks ago

Where should I include my description of how I created the files?

Sorry I reread the issue and see it should be included in the PR now