jar398 / plotter

Utilities for updating an EOL graphdb
Other
1 stars 1 forks source link

Provide metadata for pages (used in traits) that aren't in the DH #12

Closed jar398 closed 3 years ago

jar398 commented 3 years ago

Many pages that aren't in the DH show up in trait records. Without page metadata, these are not useful. Either suppress these trait records, or add page records as needed to the traits dump.

Reported by Ray Ma

jhammock commented 3 years ago

I'm game to suppress these from the traits dump since someone cares enough to want one solution or the other. @KatjaSchulz do you have a preference?

KatjaSchulz commented 3 years ago

I think it's ok to suppress them. I am currently working on ways to extend the DH specifically with respect to taxa that turn up in trait data sets, so the number of uncovered taxa should decrease with time.

jar398 commented 3 years ago

On the beta instance, about 25% of Trait nodes are for Pages that aren't in the DH (2,995,201 out of 12,262,690). Suppressing them will make the dump a little smaller and might make the dump process faster (but I don't know, it could make it slower since the query becomes a little bit more complicated). I've implemented suppression so it should be effective when Eli picks up this version.

jhammock commented 3 years ago

paging @eliagbayani in case action is needed to accept this change or in case it is a surprise when the traits export shrinks. :)

jar398 commented 3 years ago

This is going to be a major new release of the traits dumper (and my other tools such as branch painting). When I'm done testing I'll move everything to the master branch and give @eliagbayani upgrade instructions.

eliagbayani commented 3 years ago

@jar398 @jhammock , noted. Thanks.

jar398 commented 3 years ago

I did some simple timings and am fairly confident that suppressing these Page nodes won't slow down the dump process.

I'll wait to close this issue until the full traits dump succeeds (#10).