serratus-bio / open-virome

monorepo for data explorer UI and APIs
http://openvirome.com/
GNU Affero General Public License v3.0
0 stars 0 forks source link

[Map] Simple Layout #100

Open almosnow opened 2 months ago

almosnow commented 2 months ago

Proposal for the Simple and Advanced variations of the Map module.

Also, to discuss, choosing different words for Simple/Advanced, perhaps Summary/Detail? Idk, the thing is that "Simple" feels like it diminished its value.

Simple

Image

Advanced

Image

almosnow commented 2 months ago

Review, @ababaian .

ababaian commented 2 months ago

Can you create an SVG of this so I can edit it and lay it out. I can't read the writing very well.

almosnow commented 2 months ago

This will be handy at some point: https://github.com/smucode/react-world-flags

almosnow commented 2 months ago

This thread will now only track the Simple Layout which is meant to be completed much sooner than the Advanced one [#106], which is currently in the Backlog and will be taken out once this is ready.

Any comments/suggestions pertaining the Advanced view now belong to that task.

Subtasks

almosnow commented 2 months ago

Note: 3 decimal point precision (lat, lon) puts you in the range of ~80m in the equator. We'll go with this one.

ababaian commented 2 months ago

What's the values for ecah of the decimal places?

almosnow commented 2 months ago

It is a linear map at the equator, so:

2 decimals ~= 800m 1 decimal ~= 8km 1 degree ~= 80km

(but actually a degree in the equator is eq. to 111kms, perhaps the site where I got the initial measurement is wrong, anyway adjust as necessary, 3 decimals would be ~111m under this).

almosnow commented 2 months ago

Btw, grouping them at three decimals + keeping only unique attribute values brings the whole set down to 748,656 lat_lon pairs.

Down from ~40M so 1:40 improvement, I'll now compute the intersections from here.

almosnow commented 2 months ago

All countries computed, top 10 are (only entries w/ reads on palmdb):

"USA"   217018
"CHN"   213204
"GBR"   93182
"AUS"   22788
"JPN"   18114
"CAN"   17464
"DEU"   16127
"FRA"   13841
"CHE"   13632
"IND"   12093
almosnow commented 2 months ago

On frontend now, https://github.com/serratus-bio/open-virome/commit/c7b28594e5a629bd3c0c609062191cd3c81be62b.

Still pending:

ababaian commented 1 month ago

For the Simple Layout in the status message the count of how many bioSamples have Geo-data and how many are missing data is not there.

almosnow commented 1 month ago

Ok, will move from the Adv. view (will now show on both).

almosnow commented 1 month ago

Some read count values seem to be incorrect, but I will address that in a different issue.

ababaian commented 1 month ago

Sorry to nit-pick here, but spots has a different meaning in the SRA context and shouldn't be used in this context.

image

Showing 65,536 geographic spots for 631,768 palmprints found on 51,061 runs.

-->

Showing 65,536 geographic points for 631,768 palmprints found in 51,061 runs.

Why are there 65,536 geographic points for only 51,061 runs/bioSamples?

Related to this; is this 631,768 distinct contigs, or that many distinct palmprint-sOTU?

almosnow commented 1 month ago

Yeah, I did not know which word to use exactly.

We talk about these things by many names (contigs, reads, runs, spots) which are a bit ambiguous, I'll open a small thread where we can elaborate more on what each of these things mean (then use that throughout the site and docs. so the end user is also not confused) [https://github.com/serratus-bio/open-virome/issues/113].

Wrt.

Why are there 65,536 geographic points for only 51,061 runs/bioSamples?

Most likely more than one location inferred for some of those runs. (The institutions have not been removed yet).

is this 631,768 distinct contigs, or that many distinct palmprint-sOTU?

The 65,536 points that are shown on the map, come from 51k runs, which in turn have ~600k different (read, palmprint) pairs. The total # of runs is already shown elsewhere in the site, I didn't add it here bc. of that, and also because I think the geo module should only show geo data.

Distinct sOTUs (viruses) could also be an interesting # to show.