wkiri / MTE

Mars Target Encyclopedia
Apache License 2.0
5 stars 0 forks source link

Respond to Tom Stein's comments on MER-A content #42

Closed wkiri closed 2 years ago

wkiri commented 2 years ago

Tom Stein provided feedback on the MER-A content in the MTE:


wkiri commented 2 years ago

I discussed points 1 and 2 with Matt G. It seems like our best strategy going forward is to include only the targets that appear in the document collection as part of the "targets" table. The original idea of including a "complete" list of target names is just not achievable with resources available. However, we can focus on targets that appear in the mentions table as a complete set of targets in the document collection. This also explains the naming variants, since the MTE is then a summary of targets as mentioned in the literature.

We can achieve this goal by omitting the -tl target list specification to update_sqlite.py, and the final database should default back to only including targets in the annotations.

wkiri commented 2 years ago

@stevenlujpl notes that we might need to check whether "remove orphans" will still be needed, given this update to how targets will be provided.

@stevenlujpl will investigate options about how to generate a schema diagram.

stevenlujpl commented 2 years ago

@wkiri Please see the pdf attached below for the schema diagram generated using ERAlchemy for /proj/mte/sqlite/mte_mer2_all_v1.3.3.db. The diagram looks correct to me, but it doesn't seem there is a way to optimize the layout of the tables such that the dashed lines (relations) won't cross with each other. There are two dashed lines connecting the contains and sentences tables (also two dashed lines connecting has_property and sentences tables) because of the target_sentence_id and component_sentence_id fields in the contains table.

mte_mer2_all_v1.3.3.pdf

I also looked into other tools. They either don't have the ability to generate a schema diagram or I couldn't get them to work (within a reasonable amount of time). If you think this diagram doesn't work, we then probably have to draw it ourselves. Please let me know what you think. Thanks.

wkiri commented 2 years ago

@stevenlujpl Thank you! This is a great start. Interestingly, the lack of a link between the targets and aliases tables suggests to me that maybe we should have a has_alias relation that connects these tables? (instead of requiring the user to look up every target to see if it has an alias). What do you think?

I think I will create a diagram by hand but use this version as a reference. I'll share for your feedback soon!

wkiri commented 2 years ago

Draft schema diagram: MTE-schema

stevenlujpl commented 2 years ago

@wkiri This diagram looks great. Definitely an improvement over the one that is automatically generated. I like that now the arrows are showing actual relations of tables' fields instead of just the general relations between tables. However, I am not sure I understand the directions of the arrows.

I will need to think more about having a has_alias table. I think it could help, but it doesn't seem to be necessary. With or without the has_alias table, users of the MTE DB can still do target_name lookup. Maybe we have a has_alias table for completeness?

wkiri commented 2 years ago

@stevenlujpl Good point. The directionality of the arrows is meaningless, so maybe I should not have arrowheads, or have them on both ends of the arrows?

In the interest of time, I'm inclined to skip a has_alias table for MER-A (mer1), but keep it in mind for when we deliver MER-B (mer2). Sound good?

stevenlujpl commented 2 years ago

@wkiri If the directions of the arrows are meaningless, then we should not include arrowheads.

stevenlujpl commented 2 years ago

A comment on not including data types - I think the schema diagram should not include data types (which is exactly what you are doing now). The examples I've looked at online all include data types (e.g., the diagram I posted above), but they are used to describe DBs. We are describing CSV files, so I think it is completely appropriate to exclude data types in the schema diagram.

wkiri commented 2 years ago

A comment on not including data types - I think the schema diagram should not include data types (which is exactly what you are doing now).

Good point. I meant to say something about that, so thanks for highlighting it. I agree that with the .csv files, data type is less relevant. If we were delivering the DB itself, the data types would be more useful.

wkiri commented 2 years ago

Updated diagram: MTE-schema

wkiri commented 2 years ago

I have addressed all of Tom's comments. I did not do anything for item 7, because "Pr2" does not appear in the mentions table (or any other).

I also added some new quality checks (for the aliases table) to the bundle generation instructions: https://github-fn.jpl.nasa.gov/wkiri/mte/wiki/Generate-and-deliver-MTE-PDS4-bundle-to-PDS

The schema diagram needs to be in PDF/A format, which it turns out is an "archival" format of PDF. I spent some time trying to figure out how to generate this format (PowerPoint does not export in that format; Preview and ImageMagick do, but both outputs were rejected by the validate tool). I instead decided to save the schema diagram as a .jpg file, which is accepted by validate.

I also added this comment to an issue on the validate tool repo urging standard support for double-quotes inside CSV files: https://github.com/NASA-PDS/validate/issues/298#issuecomment-1036819482

wkiri commented 2 years ago

I re-generated database files for MPF and PHX that use the updated logic for what to include in the targets and aliases tables. The new files are:

I also updated the symbolic links to point to these databases as our latest versions (e.g., for the website).

wkiri commented 2 years ago

The new bundle is at /proj/mte/pds-deliveries/bundle_v1.3.5/.

wkiri commented 2 years ago

I sent the bundle to Tom and Scott. I will close this ticket for now.