zumult-org / zumultapi

1 stars 0 forks source link

Search Indexer: specify file extension? #149

Closed berndmoos closed 1 year ago

berndmoos commented 1 year ago

For a given input directory, the indexer will try to index all files in that directory:

https://github.com/zumult-org/zumultapi/blob/40d982acbd200b28e17f16cc2eab5b33280b47a1/src/main/java/org/zumult/query/searchEngine/MTASBasedSearchEngine.java#L327C13-L327C13

This is inconvenient for COMA corpora, because they will usually contain audio/video files in the same folder as the transcript file. @EleFri : would it do any harm to let that line list only the files with extension *.xml? An even cleaner solution would check whether the file is an ISO/TEI file, but that may slow down indexing...

EleFri commented 1 year ago

done