lpinner / metageta

Metadata Gathering, Extraction and Transformation Application - Unmaintained
Other
5 stars 4 forks source link

Specify ignore/black list #52

Closed alexlopespereira closed 8 years ago

alexlopespereira commented 8 years ago

I would like to specify a set of keywords that when metageta finds it in the filepath it will not add the corresponding image to destination shapefile.

Is it easy to do that?

Thank you, Alex

alexlopespereira commented 8 years ago

I saw the code and I guess I could put some verification in this part of the runcrawler.py" file.

Loop thru dataset objects returned by Crawler

    for ds in Crawler:
        try:
            logger.debug('Attempting to open %s'%Crawler.file)
            fi=ds.fileinfo
            fi['filepath']=utilities.uncpath(fi['filepath'])
            #Check whether the filepath containg any keywords from the blacklist
            #if it does, then step to the next file (next loop iteration)
            #Questions: It would be rather inefficient when I specify a directory name as a keyword, which is my goal.
            #Any suggestion about how to do that?
lpinner commented 8 years ago

@alexlopespereira are you wanting to exclude entire directories, filenames or both? Do you wish to exclude partial file/directory string matches?

alexlopespereira commented 8 years ago

I would like to exclude entire directories and files with a given extension such as jpg and png. A generic solution with an expression such as "LIKE %keyword%" would be good because it is generic and would fulfill any user request. Would you suggest any approach? Could you point the part of the could that would be suitable to implement this feature?

Thanks.

lpinner commented 8 years ago

@alexlopespereira try the develop branch. I've implemented simple file/dir exclusions.

alexlopespereira commented 8 years ago

@lpinner that is perfect and excellent. Now I can use the error log as a burndown/to-do list of my image repository folder. I would just suggest to explain in the parameter help the syntax is a space quoted list or put an example. It might not be straightforward which is the correct syntax. For example, I also tried -e '_.jpg','temporary','.bmp' and another trial was -e '.jpg' -e 'temporary' -e '_.bmp'. Thank you very much.

lpinner commented 8 years ago

Parameter help updated to 'Space delimited glob style file/directory basename exclusion pattern/s i.e *.png somedir? img_[0-9][0-9].tif'