Closed carlocolantuoni closed 3 years ago
the URLs for the 3 datasets are: https://nemoanalytics.org/p?s=3ff685af https://nemoanalytics.org/p?s=8352d6b5 https://nemoanalytics.org/p?s=79c7a289
yash and i have both tried to curate other datasets on our computers/browsers used for this and have no trouble getting GeneSymbols to come up for other datasets.
lets us know what else we can do to help.
thanks!
The POST request to the API route to get the dataset's genes is throwing an error. I'm about to go onto the server and find out why
The error is 'DataFrame' object has no attribute 'gene_symbol'
Looking at one of the datasets now.
>>> import anndata
>>> adata = anndata.read("./676783af-3879-7ba3-7574-08549d1a53a0.h5ad")
>>> adata.var
Empty DataFrame
Columns: []
Index: [RP11-560A15.3, RPS11, CREB3L1, RPL10P14, PNMA1, RP11-783L4.1, AC092634.2, RP11-798K23.4, TMEM216, TRAF3IP2-AS1, C10orf90, RP1-273G13.1, CTD-2240J17.4, ERCC5, RP11-96K19.5, RP11-201E8.1, APBB2, AC097724.3, KLHL13, RNU4ATAC2P, RP11-360F5.3, CADM4, MIR6500, XXbac-BPG157A10.21, CST2P1, SLC10A7, OR5H5P, CFHR5, OR2K2, LMAN1, RP11-6O2.3, CHD8, SUMO1, BOLA3-AS1, CTD-2193P3.1, IFNWP18, AC016561.1, AC012314.20, RP11-463J10.3, MMP7, MIR1976, RP11-335O4.3, CIR1P2, XAB2, Z85986.1, ADAM21P1, RP11-96B2.1, RN7SL499P, RP11-554L12.2, CTC-487M23.8, RNVU1-14, ZBTB12, UTY, CENPQ, RP4-754E20__A.5, DTNBP1, LINC00683, AC012065.4, RP11-70F11.11, ZG16, RP11-116N8.2, PRKAG2-AS1, MIR582, AC091178.2, AC006499.7, MIER1, RNA5SP93, RP11-384G23.1, ARID3C, RNU7-164P, RP1-39G22.7, WBP1LP6, RP11-271C24.2, TRMT112P4, LLNLR-284B4.1, MIR489, RP11-263I1.1, GRM2, MIR4511, PROSC, RNU1-124P, RP11-309L24.10, CXCL13, RP13-20L14.4, EHHADH-AS1, RP11-201K10.3, RNU6-332P, SYN3, LINC00210, SLC22A2, SERPINF1, WDR34, SUGCT, FAM8A6P, EPT1, BNIP3P5, KB-226F1.2, RP11-74J13.8, LHB, CTD-2515C13.2, ...]
The gene symbols were assigned to the index for the "anndata.var" dataframe, which is not what is expected. Typically the Ensembl ID is assigned to the index and the gene symbols are assigned to a separated "gene_symbol" column.
@carlocolantuoni is it possible to see one of the original uploaded files you and Yash uploaded?
hey shaun, here is 1 of the tar balls yash uploaded
looks like i attached nothing - shaun is there a way i can send a file here in github or should i email t?
humvchimp.tar.gz https://drive.google.com/file/d/19Frl_-CH2H7xHRcrErf5Ahwi0hjPk2UU/view?usp=drive_web here is 1 of the tar balls yash uploaded
On Mon, Jun 28, 2021 at 12:17 PM Shaun Adkins @.***> wrote:
The error is 'DataFrame' object has no attribute 'gene_symbol'
Looking at one of the datasets now.
import anndata adata = anndata.read("./676783af-3879-7ba3-7574-08549d1a53a0.h5ad") adata.var Empty DataFrame Columns: [] Index: [RP11-560A15.3, RPS11, CREB3L1, RPL10P14, PNMA1, RP11-783L4.1, AC092634.2, RP11-798K23.4, TMEM216, TRAF3IP2-AS1, C10orf90, RP1-273G13.1, CTD-2240J17.4, ERCC5, RP11-96K19.5, RP11-201E8.1, APBB2, AC097724.3, KLHL13, RNU4ATAC2P, RP11-360F5.3, CADM4, MIR6500, XXbac-BPG157A10.21, CST2P1, SLC10A7, OR5H5P, CFHR5, OR2K2, LMAN1, RP11-6O2.3, CHD8, SUMO1, BOLA3-AS1, CTD-2193P3.1, IFNWP18, AC016561.1, AC012314.20, RP11-463J10.3, MMP7, MIR1976, RP11-335O4.3, CIR1P2, XAB2, Z85986.1, ADAM21P1, RP11-96B2.1, RN7SL499P, RP11-554L12.2, CTC-487M23.8, RNVU1-14, ZBTB12, UTY, CENPQ, RP4-754E20__A.5, DTNBP1, LINC00683, AC012065.4, RP11-70F11.11, ZG16, RP11-116N8.2, PRKAG2-AS1, MIR582, AC091178.2, AC006499.7, MIER1, RNA5SP93, RP11-384G23.1, ARID3C, RNU7-164P, RP1-39G22.7, WBP1LP6, RP11-271C24.2, TRMT112P4, LLNLR-284B4.1, MIR489, RP11-263I1.1, GRM2, MIR4511, PROSC, RNU1-124P, RP11-309L24.10, CXCL13, RP13-20L14.4, EHHADH-AS1, RP11-201K10.3, RNU6-332P, SYN3, LINC00210, SLC22A2, SERPINF1, WDR34, SUGCT, FAM8A6P, EPT1, BNIP3P5, KB-226F1.2, RP11-74J13.8, LHB, CTD-2515C13.2, ...]
The gene symbols were assigned to the index for the "anndata.var" dataframe, which is not what is expected. Typically the Ensembl ID is assigned to the index and the gene symbols are assigned to a separated "gene_symbol" column.
@carlocolantuoni https://github.com/carlocolantuoni is it possible to see one of the original uploaded files you and Yash uploaded?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/153#issuecomment-869820782, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7UUCHYJK2AOZFNZP4DTVCOCBANCNFSM47OIH3GQ .
-- Carlo
sent an email with google drive attachment - looks like it worked in the link above - let me know if u can get the file
Received tarball... thanks!
@carlocolantuoni
In the genes.tab file, both the Ensembl ID and the gene_symbol need to be provided, like so (in this random genes.tab file I had on hand).
ensembl_ID gene_symbol
ENSMUSG00000051951 Xkr4
ENSMUSG00000089699 Gm1992
ENSMUSG00000102343 Gm37381
ENSMUSG00000025900 Rp1
ENSMUSG00000109048 Rp1
ENSMUSG00000025902 Sox17
ENSMUSG00000104328 Gm37323
ENSMUSG00000033845 Mrpl15
ENSMUSG00000025903 Lypla1
Lots of gEAR code relies on the "gene_symbol" column in the AnnData object, and if the "gene_symbol" column is the only column uploaded via the genes.tab file, then it is treated as the index column instead. Can you and Yash make this correction and resubmit?
will do thnx!
On Mon, Jun 28, 2021 at 2:27 PM Shaun Adkins @.***> wrote:
@carlocolantuoni https://github.com/carlocolantuoni
In the genes.tab file, both the Ensembl ID and the gene_symbol need to be provided, like so (in this random genes.tab file I had on hand).
ensembl_ID gene_symbol ENSMUSG00000051951 Xkr4 ENSMUSG00000089699 Gm1992 ENSMUSG00000102343 Gm37381 ENSMUSG00000025900 Rp1 ENSMUSG00000109048 Rp1 ENSMUSG00000025902 Sox17 ENSMUSG00000104328 Gm37323 ENSMUSG00000033845 Mrpl15 ENSMUSG00000025903 Lypla1
Lots of gEAR code relies on the "gene_symbol" column in the AnnData object, and if the "gene_symbol" column is the only column uploaded via the genes.tab file, then it is treated as the index column instead. Can you and Yash make this correction and resubmit?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/153#issuecomment-869917554, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7SAJPAYEVNKGLFNW6LTVC5KBANCNFSM47OIH3GQ .
-- Carlo
it worked shaun! thanks!
On Mon, Jun 28, 2021 at 2:27 PM Shaun Adkins @.***> wrote:
@carlocolantuoni https://github.com/carlocolantuoni
In the genes.tab file, both the Ensembl ID and the gene_symbol need to be provided, like so (in this random genes.tab file I had on hand).
ensembl_ID gene_symbol ENSMUSG00000051951 Xkr4 ENSMUSG00000089699 Gm1992 ENSMUSG00000102343 Gm37381 ENSMUSG00000025900 Rp1 ENSMUSG00000109048 Rp1 ENSMUSG00000025902 Sox17 ENSMUSG00000104328 Gm37323 ENSMUSG00000033845 Mrpl15 ENSMUSG00000025903 Lypla1
Lots of gEAR code relies on the "gene_symbol" column in the AnnData object, and if the "gene_symbol" column is the only column uploaded via the genes.tab file, then it is treated as the index column instead. Can you and Yash make this correction and resubmit?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/153#issuecomment-869917554, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7SAJPAYEVNKGLFNW6LTVC5KBANCNFSM47OIH3GQ .
-- Carlo
My collaborator Yash in Johns Hopkins Biomedical Engineering has successfully uploaded 3 new datasets to NeMO Analytics, but is encountering a problem when he tries to curate views of these datasets. Below i have included screen shots of the 3 datasets in the Dataset Explorer as well as a screen shot of where things stall for all 3 datstets in the curator. Basically, it never loads GeneSymbols for the datasets. We have looked at the "genes" column we uploaded and they look like properly formatted GeneSymbols to us. Is there a way to see what is causing the hangup?