Open almosnow opened 2 months ago
Review, @ababaian .
Can you create an SVG of this so I can edit it and lay it out. I can't read the writing very well.
This will be handy at some point: https://github.com/smucode/react-world-flags
This thread will now only track the Simple Layout which is meant to be completed much sooner than the Advanced one [#106], which is currently in the Backlog and will be taken out once this is ready.
Any comments/suggestions pertaining the Advanced view now belong to that task.
Subtasks
Note: 3 decimal point precision (lat, lon) puts you in the range of ~80m in the equator. We'll go with this one.
What's the values for ecah of the decimal places?
It is a linear map at the equator, so:
2 decimals ~= 800m 1 decimal ~= 8km 1 degree ~= 80km
(but actually a degree in the equator is eq. to 111kms, perhaps the site where I got the initial measurement is wrong, anyway adjust as necessary, 3 decimals would be ~111m under this).
Btw, grouping them at three decimals + keeping only unique attribute values brings the whole set down to 748,656 lat_lon pairs.
Down from ~40M so 1:40 improvement, I'll now compute the intersections from here.
All countries computed, top 10 are (only entries w/ reads on palmdb):
"USA" 217018
"CHN" 213204
"GBR" 93182
"AUS" 22788
"JPN" 18114
"CAN" 17464
"DEU" 16127
"FRA" 13841
"CHE" 13632
"IND" 12093
On frontend now, https://github.com/serratus-bio/open-virome/commit/c7b28594e5a629bd3c0c609062191cd3c81be62b.
Still pending:
For the Simple Layout
in the status message the count of how many bioSamples have Geo-data and how many are missing data is not there.
Ok, will move from the Adv. view (will now show on both).
Some read count values seem to be incorrect, but I will address that in a different issue.
Sorry to nit-pick here, but spots
has a different meaning in the SRA context and shouldn't be used in this context.
Showing 65,536 geographic spots for 631,768 palmprints found on 51,061 runs.
-->
Showing 65,536 geographic points for 631,768 palmprints found in 51,061 runs.
Why are there 65,536 geographic points for only 51,061 runs/bioSamples?
Related to this; is this 631,768 distinct contigs, or that many distinct palmprint-sOTU?
Yeah, I did not know which word to use exactly.
We talk about these things by many names (contigs, reads, runs, spots) which are a bit ambiguous, I'll open a small thread where we can elaborate more on what each of these things mean (then use that throughout the site and docs. so the end user is also not confused) [https://github.com/serratus-bio/open-virome/issues/113].
Wrt.
Why are there 65,536 geographic points for only 51,061 runs/bioSamples?
Most likely more than one location inferred for some of those runs. (The institutions have not been removed yet).
is this 631,768 distinct contigs, or that many distinct palmprint-sOTU?
The 65,536 points that are shown on the map, come from 51k runs, which in turn have ~600k different (read, palmprint) pairs. The total # of runs is already shown elsewhere in the site, I didn't add it here bc. of that, and also because I think the geo module should only show geo data.
Distinct sOTUs (viruses) could also be an interesting # to show.
Proposal for the Simple and Advanced variations of the Map module.
Also, to discuss, choosing different words for Simple/Advanced, perhaps Summary/Detail? Idk, the thing is that "Simple" feels like it diminished its value.
Simple
Advanced