serratus-bio / open-virome

monorepo for data explorer UI and APIs
http://openvirome.com/
GNU Affero General Public License v3.0
0 stars 0 forks source link

[Map] Add country statistics to map table #101

Open almosnow opened 1 week ago

almosnow commented 1 week ago

Country boundaries are loaded into the db.

Compute the intersection of biosample_geographical_location vs. them, same as it is currently being done with Biomes. Add both to the same materialized view on different columns (rationale: samples could belong to a biome AND a country).

ababaian commented 1 week ago

Merging #102 here

Once https://github.com/serratus-bio/open-virome/issues/101#issue-2505482380 is ready,

We can add large geographical regions on top of the country column, this is a simple aggregation of values, ex. North America is a limited list of country codes.

Aside from this I will add an extra aggregation region for Ocean/Sea to group everything that does not match a land boundary.

ababaian commented 1 week ago

From #105

We have a table with all the country boundaries and I am trying to generate a materialized view with the list of BioSample and which country boundary do they belong to. We already do this with the biomes so the process would've applied trivially.

However ... I tried to do this a couple times yesterday and got this back:

(btw, each of these commands take about 5 hours to execute so I cannot just run it each time to see what's going on)

If I run try to build a smaller (100k rows) materialized view, it works. postgres should handle this sort of thing automatically but it's not doing it for whatever reason.

The plan is then to split the creation of the materialized view into as many 100k subqueries as needed, then join them all at the end.

Image

ababaian commented 7 hours ago