mbari-org / SeafloorMappingDB

Make MBARI seafloor mapping datasets more accessible and useful
GNU General Public License v3.0
3 stars 6 forks source link

Exclude.list management #249

Open jbpaduan opened 5 months ago

jbpaduan commented 5 months ago

The exclude.list, by which directories in SeafloorMapping get excluded from consideration by the SMDB load script, has become cumbersome, will only get longer, and must be managed by someone with a working SMDB Docker. This issue addresses changes to improve its management, by changing how: 1) the exclude.list is constructed, 2) additions/subtractions to the list are made, and 3) the list can be evaluated. 1) Spreadsheets in the year/SMDB directories will contain paths to directories to be excluded from the exclude.list, named like 2023/exclude_list.xlsx, and the load script will concatenate these source spreadsheets to regenerate the exclude.list file in the repository. 2) Additions/subtractions can be made by modifying the source spreadsheets. 3) In the header of the SMDB website, a pop-up window can be opened that lists, in descending year order and then alphabetically, the contents of the exclude.list, as a reflection of what the database thinks should be excluded. Presumably in the Load Log Output, those paths not found should be indicated with a warning, to mark a path that hasn't been found and make it obvious there might be a typo in a source spreadsheet.

MBARIMike commented 5 months ago

I suggest naming the exclude_list files like:

2023/exclude_list_2023.xlsx
OceanImaging2017/exclude_list_OceanImaging2017.xlsx

Including the parent directory name in the .xlsx file name will avoid confusion if multiple spreadsheets are opened in Excel.

I suggest moving the button for the Load Log Output from the home page back to the header (undoing https://github.com/mbari-org/SeafloorMappingDB/commit/4133a336b3418e276d0dfceacbd6d6bbd8446a35 and https://github.com/mbari-org/SeafloorMappingDB/commit/64155336da96e565558408caddc3e127d8588880) so that it's visible from all the other pages of the site. I often find myself on the missions or compilations page and want to check the load log for something is the way it is. It'd be nice to have simple hrefs with target='_blank' to the load log output and the exclude list in the header so that it's easily accessible from all pages on the site.

MBARIMike commented 5 months ago

On second thought I propose naming the exclude_list files like:

/Volumes/SeafloorMapping/2024/SMDB/2024_exclude_list.csv
/Volumes/SeafloorMapping/MappingAUVOps2006/SMDB/MappingAUVOps2006_exclude_list.csv

This is analogous with how the survey_tally files are named. I also propose using the same workflow as is done for the survey_tally files:

  1. Existing exclude_list paths will be written to a .csv file
  2. The .csv file will be imported into and Excel file and saved and edited as an .xlsx file
  3. The exclude_list paths used during the load will be those that are in the .xlsx files

Once all of the .xlsx files are created from the .csv files I'll change the load.py logic to use them instead of the repo's smdb/config/exclude.list file.

MBARIMike commented 5 months ago

https://github.com/mbari-org/SeafloorMappingDB/pull/254 has been pulled to production and executed with this output:

INFO 2024-06-05 19:54:42,369 load.py read_config_exclude_list():1760 Read 177 paths to exclude from /app/config/exclude.list
INFO 2024-06-05 19:54:42,519 load.py read_exclude_path_xlsxs():1783 Read 6 paths to exclude from /mbari/SeafloorMapping/2019/SMDB/2019_exclude_list.xlsx
INFO 2024-06-05 19:54:42,550 load.py write_exclude_path_csvs():1808 Wrote 6 paths to /mbari/SeafloorMapping/2019/SMDB/2019_exclude_list.csv
INFO 2024-06-05 19:54:42,553 load.py write_exclude_path_csvs():1808 Wrote 6 paths to /mbari/SeafloorMapping/2020/SMDB/2020_exclude_list.csv
INFO 2024-06-05 19:54:42,555 load.py write_exclude_path_csvs():1808 Wrote 4 paths to /mbari/SeafloorMapping/2021/SMDB/2021_exclude_list.csv
INFO 2024-06-05 19:54:42,557 load.py write_exclude_path_csvs():1808 Wrote 18 paths to /mbari/SeafloorMapping/2022/SMDB/2022_exclude_list.csv
INFO 2024-06-05 19:54:42,559 load.py write_exclude_path_csvs():1808 Wrote 23 paths to /mbari/SeafloorMapping/2024/SMDB/2024_exclude_list.csv
INFO 2024-06-05 19:54:42,561 load.py write_exclude_path_csvs():1808 Wrote 9 paths to /mbari/SeafloorMapping/MappingAUVOps2006/SMDB/MappingAUVOps2006_exclude_list.csv
INFO 2024-06-05 19:54:42,563 load.py write_exclude_path_csvs():1808 Wrote 5 paths to /mbari/SeafloorMapping/MappingAUVOps2007/SMDB/MappingAUVOps2007_exclude_list.csv
INFO 2024-06-05 19:54:42,565 load.py write_exclude_path_csvs():1808 Wrote 7 paths to /mbari/SeafloorMapping/MappingAUVOps2008/SMDB/MappingAUVOps2008_exclude_list.csv
INFO 2024-06-05 19:54:42,567 load.py write_exclude_path_csvs():1808 Wrote 8 paths to /mbari/SeafloorMapping/MappingAUVOps2009/SMDB/MappingAUVOps2009_exclude_list.csv
INFO 2024-06-05 19:54:42,569 load.py write_exclude_path_csvs():1808 Wrote 8 paths to /mbari/SeafloorMapping/MappingAUVOps2010/SMDB/MappingAUVOps2010_exclude_list.csv
INFO 2024-06-05 19:54:42,571 load.py write_exclude_path_csvs():1808 Wrote 5 paths to /mbari/SeafloorMapping/MappingAUVOps2011/SMDB/MappingAUVOps2011_exclude_list.csv
INFO 2024-06-05 19:54:42,573 load.py write_exclude_path_csvs():1808 Wrote 4 paths to /mbari/SeafloorMapping/MappingAUVOps2012/SMDB/MappingAUVOps2012_exclude_list.csv
INFO 2024-06-05 19:54:42,575 load.py write_exclude_path_csvs():1808 Wrote 3 paths to /mbari/SeafloorMapping/MappingAUVOps2013/SMDB/MappingAUVOps2013_exclude_list.csv
INFO 2024-06-05 19:54:42,577 load.py write_exclude_path_csvs():1808 Wrote 3 paths to /mbari/SeafloorMapping/MappingAUVOps2014/SMDB/MappingAUVOps2014_exclude_list.csv
INFO 2024-06-05 19:54:42,579 load.py write_exclude_path_csvs():1808 Wrote 5 paths to /mbari/SeafloorMapping/MappingAUVOps2015/SMDB/MappingAUVOps2015_exclude_list.csv
INFO 2024-06-05 19:54:42,581 load.py write_exclude_path_csvs():1808 Wrote 14 paths to /mbari/SeafloorMapping/MappingAUVOps2016/SMDB/MappingAUVOps2016_exclude_list.csv
INFO 2024-06-05 19:54:42,583 load.py write_exclude_path_csvs():1808 Wrote 7 paths to /mbari/SeafloorMapping/MappingAUVOps2017/SMDB/MappingAUVOps2017_exclude_list.csv
INFO 2024-06-05 19:54:42,585 load.py write_exclude_path_csvs():1808 Wrote 2 paths to /mbari/SeafloorMapping/MappingAUVOps2018/SMDB/MappingAUVOps2018_exclude_list.csv
INFO 2024-06-05 19:54:42,587 load.py write_exclude_path_csvs():1808 Wrote 1 paths to /mbari/SeafloorMapping/MappingAUVOpsStuff/SMDB/MappingAUVOpsStuff_exclude_list.csv
INFO 2024-06-05 19:54:42,589 load.py write_exclude_path_csvs():1808 Wrote 1 paths to /mbari/SeafloorMapping/OceanImaging2012/SMDB/OceanImaging2012_exclude_list.csv
INFO 2024-06-05 19:54:42,591 load.py write_exclude_path_csvs():1808 Wrote 12 paths to /mbari/SeafloorMapping/OceanImaging2013/SMDB/OceanImaging2013_exclude_list.csv
INFO 2024-06-05 19:54:42,593 load.py write_exclude_path_csvs():1808 Wrote 10 paths to /mbari/SeafloorMapping/OceanImaging2014/SMDB/OceanImaging2014_exclude_list.csv
INFO 2024-06-05 19:54:42,595 load.py write_exclude_path_csvs():1808 Wrote 6 paths to /mbari/SeafloorMapping/OceanImaging2015/SMDB/OceanImaging2015_exclude_list.csv
INFO 2024-06-05 19:54:42,597 load.py write_exclude_path_csvs():1808 Wrote 5 paths to /mbari/SeafloorMapping/OceanImaging2016/SMDB/OceanImaging2016_exclude_list.csv
INFO 2024-06-05 19:54:42,599 load.py write_exclude_path_csvs():1808 Wrote 3 paths to /mbari/SeafloorMapping/OceanImaging2018/SMDB/OceanImaging2018_exclude_list.csv
INFO 2024-06-05 19:54:42,601 load.py write_exclude_path_csvs():1808 Wrote 1 paths to /mbari/SeafloorMapping/mbsystem/SMDB/mbsystem_exclude_list.csv
INFO 2024-06-05 19:54:42,603 load.py write_exclude_path_csvs():1808 Wrote 1 paths to /mbari/SeafloorMapping/swathdata/SMDB/swathdata_exclude_list.csv

File /mbari/SeafloorMapping/2019/SMDB/2019_exclude_list.xlsx was created from the corresponding .csv file as a test. The remaining .csv files need to be converted to .xlsx files where new edits can be made.

MBARIMike commented 5 months ago

The .xlsx -> load -> .csv workflow is now in place in production with the sorted consolidated exclude list written to https://smdb.shore.mbari.org/media/logs/exclude_list.txt

MBARIMike commented 5 months ago

Last weekend's load failed to exclude any exclude_paths because the logic was changed in https://github.com/mbari-org/SeafloorMappingDB/pull/258. https://github.com/mbari-org/SeafloorMappingDB/pull/259 should fix this.