Open iDrDex opened 7 years ago
@idrdex Would you like to replace current 2d plots
with something like this http://bl.ocks.org/camio/5087116 ?
What is PMID @idrdex ? How can we count it?
The fist set of graphics will show common system information Each graphic will have only one line which represents the total amount of items
The second set of graphics will show distribution by species. Each graphic will have button which switches one line representation to multiline representation (each species will have its own line)
The third set of graphics will show distribution by approving. Each graphic will have button which switches one line representation to two representation (amount of approved and rejected items)
The fourth set of graphics will show distribution by user contribution. For each user will be a line with the count of contributed tags.
P.S. For annotations (both Series and Sample) there are three possible options to show
Approved SampleTag is a SampleTag which has relation with SampleAnnotation with best_cohens_kappa Approved SerieTag is a SerieTag which has agreed flag set to True
Filtering by actuality (not deleted)
SeriesTag: is_active
SampleTag: is_active
SerieValidation: not ignored and not by_incompetent
SampleValidation: # by its serie_validation
Selecting concordant/non-concordant and validated/not-validated
SerieAnnotation: best_cohens_kappa == 1
SampleAnnotation: # by serie_annotation, can't be invalidated separately from it
SeriesTag: agreed is not None
SampleTag: # by its series_tag
SerieValidation: best_kappa == 1
SampleValidation: concordant or serie_validation.best_kappa == 1
Also, for recreating history SeriesTag has several events:
annotation_kappa == 1
-> it becomes validatedagrees_with
-> it becomes invalidAlso, when first SeriesTag-SerieValidation or SerieValidation-SerieValidation match appears everything else in that group becomes invalid.
Hi all. Any updates on this issue? I noticed that stargeo.org/stats is still the user statistics and only available to super users. I suggest we rename this current page to stargeo.org/users and make a new stargeo.org/stats page with counts like I mentioned in the initial post for this issue. The stats page must support this claim in the paper that is about to be published:
'To date, over 21,000 PubMed publications have been derived from over 1,000,000 digital samples (see http://STARGEO.org/stats)...'
Hi @idrdex
When the project started, there was no model to store counters of project items. So, the first part of this task was to restore this data. This part of work is finished.
Now we have all data to create a graphics and you can see them here. It's a set of very simple graphics, it only displays raw data from the database.
My current task is to prettify this graphics and UI. I will group them by types. When it is ready I will change url of this page to stargeo.org/stats and add it to main menu.
Hi @idrdex @Suor I have released the first version of the graphics. http://stargeo.org/stats/ What do you think about the result?
Awesome. Thx. Can you add a graph for PMID cumulative distribution? Just plot the sum total of unique PMIDs over time. This is most important for the paper.
Sent from my iPhone
On Jul 28, 2017, at 12:44 AM, Ilya Beda notifications@github.com wrote:
Hi @idrdex @Suor I have released the first version of the graphics. http://stargeo.org/stats/ What do you think about the result?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
What is PMID?
пт, 28 июля 2017 г., 18:18 idrdex notifications@github.com:
Awesome. Thx. Can you add a graph for PMID cumulative distribution? Just plot the sum total of unique PMIDs over time. This is most important for the paper.
Sent from my iPhone
On Jul 28, 2017, at 12:44 AM, Ilya Beda notifications@github.com wrote:
Hi @idrdex @Suor I have released the first version of the graphics. http://stargeo.org/stats/ What do you think about the result?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/idrdex/star-django/issues/71#issuecomment-318627103, or mute the thread https://github.com/notifications/unsubscribe-auth/ABa4V_ZPcl7FZKKVlkW5zVFcr2cby5dSks5sScNngaJpZM4Niu8_ .
PMID is an associated PubMed publication that derived from the Series data. It would map to a given Series. For instance, see https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE51808 as an example Series with associated publication referenced by PMID=24981333 (https://www.ncbi.nlm.nih.gov/pubmed/24981333). I know PMID is in the tables somewhere as I can query it from the JSON that is stored on Postgres. We should probably show PMID in the search results for every Series returned and link to pubmed just like GEO does. @Suor may want to chime in.
@ir4y @Suor we need a cumulative graphic of unique PMID counts ASAP. This code will generate a stargeo data frame that you can use to plot it:
def parse_url(url, params):
#http://stackoverflow.com/questions/2506379/add-params-to-given-url-in-python
import urlparse
from urllib import urlencode
url_parts = list(urlparse.urlparse(url))
query = dict(urlparse.parse_qsl(url_parts[4]))
query.update(params)
url_parts[4] = urlencode(query)
return urlparse.urlunparse(url_parts)
def query_api(url = 'http://stargeo.org/api/serie_annotations/', limit=1000):
import requests
url = parse_url(url, dict(limit=limit))
while True:
response = requests.get(url).json()
for result in response['results']:
yield result
if not response['next']:
break
url = response['next']
def query_df(url = 'http://stargeo.org/api/serie_annotations/', attrs=True):
import pandas as pd
df = pd.DataFrame(query_api(url))
if attrs:
df = expand_attrs(df)
return df
def expand_attrs(df):
import pandas as pd
if 'attrs' in df:
attrs = pd.DataFrame(dict(attr) for attr in df.attrs)
df = df.drop('attrs', 1)
return df.join(attrs)
return df
def read_stargeo():
import sys
print "Querying STARGEO.org...",
sys.stdout.flush()
stargeo = query_df('http://stargeo.org/api/series/')\
.sort('samples_count',
ascending=False)\
.set_index('id')
stargeo.index.name = 'series_id'
print len(stargeo.index), "records done!"
return stargeo
stargeo = read_stargeo()
from itertools import chain
items = [ids.split("|\n|") for ids in stargeo.pubmed_id.drop_duplicates().dropna()]
stargeo_pmid = set(chain(*items))
print len(stargeo), 'records'
print len(stargeo_pmid), 'distinct PMIDs'
Ok, Ilya is on it.
1 сент. 2017 г. 12:30 пользователь "idrdex" notifications@github.com написал:
@ir4y https://github.com/ir4y @Suor https://github.com/suor we need a cumulative graphic of unique PMID counts ASAP.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/idrdex/star-django/issues/71#issuecomment-326492493, or mute the thread https://github.com/notifications/unsubscribe-auth/AARVx5MOl3PvYzyf7EwRdjxEecP1FosUks5sd5Z2gaJpZM4Niu8_ .
@idrdex I have pushed graphic for distinct PMID distributed by dates. http://localhost:8000/stats/
BTW What is the priority for other graphics. They are not updating now. I am planing to finish this task after SkinIQ. Is it OK?
Sure. PMIDs was critical as we are about to publish the paper. We should reorganize the tabs as well on the stats page, but lets revisit after melanoma app is delivered. We need to focus on a full stargeo redesign honestly. But melanoma is highest priority now. Thx.
We need the above URL with basic stats in one place.
Two basic sets stats: 1) Reference, and 2) Generated.
1) Reference stats are counts for everything we already keep track of and in sync from GEO: Series, Samples, Platform, Probes, including some more that we need to count: PMID. I'd like to see some cumulative graphical distribution (across species maybe) like: https://www.ncbi.nlm.nih.gov/core/lw/2.0/html/tileshop_pmc/tileshop_pmc_inline.html?title=Click%20on%20image%20to%20zoom&p=PMC3&id=3531084_gks1193f1p.jpg
2) Generated stats are counts of everything we generate through the STARGEO.org front end. Users, Tags, Annotations, etc. We need cumulative graphing abilities too here.