rezacsedu / topfed

Automatically exported from code.google.com/p/topfed
0 stars 0 forks source link

Problem Reproducing Use-Cases #1

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Hello!

I encountered some problems while trying to reproduce the example Use-Case 
SPARQL queries from your article "Linked Cancer Genome Atlas Database".

For example, the first shown SPARQL Query (Listing 1) in the article, queries 
for the string "BRCA" of the predicate <http://tcga.deri.ie/schema/tumor_type>, 
which should return all subjects that have "BRCA" as a predicate.
When I now try to execute this single statement like this:

select *
where {
 ?uri <http://tcga.deri.ie/schema/tumor_type> ?type.
} limit 10

I only get results of the type:

uri type
http://tcga.deri.ie/TCGA-B9-4116    Type 1

So there is no single match for "BRCA", only "Type1" and "Type2". Also, i at 
least found one predicate missing in my local virtuoso version, that is used in 
the Use-Case example, namely tcga:gene_symbol. 
Another problem is that i couldn't find out how you linked the dataset to 
Bio2RDF's HGNC and OMIM. I tried some queries but couldn't find any cross 
references.

My Workflow so far:
1. Downloaded Blue1-virtuoso-server: 
https://docs.google.com/file/d/0BzemFAUFXpqOeTR5NWdoMG9hRms/edit?usp=sharing 
from the start page here
2. Run start.bat
3. Execute the SPARQL query above on localhost:8890/sparql

I really appreciate your work and making it publicly avialable, especially 
making the virtuoso portable is very convenient!
Any thoughts/hints on what i did wrong or what i am missing would be great :)

Best Regards,
Dominik

Original issue reported on code.google.com by schweige...@gmail.com on 5 Aug 2013 at 10:02

GoogleCodeExporter commented 8 years ago
Hi Dominik, 
Thanks for your interest in our work. We have two papers out of TCGA Project: 
1) Linked Cancer Genome Atlas (accepted for Linked Data Cup challenge at 
I-Semantics2013) 2) TopFed: TCGA Tailored Federated Query Processing and 
linking to LOD (under review at BMC Bio Informatics). The first one target the 
use cases of our Linked TCGA while later have full details of our project along 
with a smart federated query processing engine, tailored for TCGA data. The 
virtuoso server you are referring is used in the second research contribution. 
Soon we will have majority of the TCGA data available via SPARQL endpoints with 
federated query processing interface. Regarding Linking stuff, please refer to 
listing 1 
http://publicationslist.org/data/muhammad-saleem/ref-2/Linked%20TCGA-I-Semantics
2013-final.pdf. Hope this answer your questions. Please feel free to contact me 
at saleem.muhammd@gmail.com for any further queries.  

Original comment by saleem.m...@gmail.com on 11 Aug 2013 at 1:16