Closed carlocolantuoni closed 4 years ago
Carlo, how many cells in each of these datasets? As a test can you keep the cell number but limit the metadata to the minimal necessary for your displays and analyses?
Get Outlook for iOShttps://aka.ms/o0ukef
From: Carlo Colantuoni notifications@github.com Sent: Saturday, June 20, 2020 3:37:34 AM To: nemoarchive/analytics analytics@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [nemoarchive/analytics] cant upload to NeMO for cortical development profile (#114)
after getting the meta data in, during the expression file upload: i have tried to upload 2 different files repeatedly and am getting: "Oops! File upload failed. Try again and contact us if this continues." one was ~270MB, the other ~170MB - is there a timeout? or size limit? something else?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/nemoarchive/analytics/issues/114, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEFF5X44DH6TS2P2JDNLWHTRXRRL5ANCNFSM4ODIUURA.
Carlo - when we built the new server we failed to increase the file size upload limit initially on the web server. I have fixed this and did a quick demo upload. Please try again.
i am editing the CarloTEMP profile in NeMO Curator account - when ready this should be moved up to be the default profile for the Cortical Development profile.Joshua and i interacted last night and he was able to get the static tSNE visualization to work in a way that i can bring in new data sets in that visualization. these data have been in: /autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/PrimateDevoAtlassince march, but havent been pulled in, so for this short deadline i am re formatting them all for manual upload.i couldnt get the uploader to work last night. joshua just fixed the file size limit that was a problem for this. now its telling me "Oops! No observations sheet found. Expected spreadsheet sheet named 'observations'."but there is an observations tab. - any idea what to try here? joshua - can u run the chron job to grab them from the dir with all the data in it?or should we continue to trouble shoot with the manual uploader?
fyi - this is not single cell data. it is laser captured samples on microarray from the allen brain.there are 1093 samples in the smaller one and 1855 in the larger. only 12000 rows/genes. -- Carlo
sorry for the formatting on that msg - here is the same text parsed into readable sentences:
i am editing the CarloTEMP profile in NeMO Curator account - when ready this should be moved up to be the default profile for the Cortical Development profile. Joshua and i interacted last night and he was able to get the static tSNE visualization to work in a way that i can bring in new data sets in that visualization.
The data im trying to upload now have been in: "/autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/PrimateDevoAtlas" since march, but havent been pulled in, so for this short deadline i am re formatting them all for manual upload. i couldnt get the uploader to work last night. but joshua just fixed the file size limit that was a problem for this.
now its telling me "Oops! No observations sheet found. Expected spreadsheet sheet named 'observations'." but there is an observations tab. - any idea what to try here?
joshua - can u run the chron job to grab them from the dir with all the data in it?or should we continue to trouble shoot with the manual uploader?
fyi - this is not single cell data. it is laser captured samples on microarray from the allen brain.there are 1093 samples in the smaller one and 1855 in the larger. only 12000 rows/genes.
carlo
Carlo, Yang is happy to help you.
Get Outlook for iOShttps://aka.ms/o0ukef
From: Carlo Colantuoni notifications@github.com Sent: Saturday, June 20, 2020 1:54:14 PM To: nemoarchive/analytics analytics@noreply.github.com Cc: hertzron hertzron@gmail.com; Comment comment@noreply.github.com Subject: Re: [nemoarchive/analytics] cant upload to NeMO for cortical development profile (#114)
sorry for the formatting on that msg - here is the same text parsed into readable sentences:
i am editing the CarloTEMP profile in NeMO Curator account - when ready this should be moved up to be the default profile for the Cortical Development profile. Joshua and i interacted last night and he was able to get the static tSNE visualization to work in a way that i can bring in new data sets in that visualization.
The data im trying to upload now have been in: "/autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/PrimateDevoAtlas" since march, but havent been pulled in, so for this short deadline i am re formatting them all for manual upload. i couldnt get the uploader to work last night. but joshua just fixed the file size limit that was a problem for this.
now its telling me "Oops! No observations sheet found. Expected spreadsheet sheet named 'observations'." but there is an observations tab. - any idea what to try here?
joshua - can u run the chron job to grab them from the dir with all the data in it?or should we continue to trouble shoot with the manual uploader?
fyi - this is not single cell data. it is laser captured samples on microarray from the allen brain.there are 1093 samples in the smaller one and 1855 in the larger. only 12000 rows/genes.
carlo
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/nemoarchive/analytics/issues/114#issuecomment-647026852, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEFF5X7FG4ZCXXWOPJFC7HDRXTZUNANCNFSM4ODIUURA.
Do I need need to run the NeMO uploader chron? If so I can do that, but don't know if whatever datasets you have put there have made it through the indexing steps which make it visible in the files the chron reads. Shaun would have to help check there.
JO
On Sat, Jun 20, 2020 at 12:58 PM hertzron notifications@github.com wrote:
Carlo, Yang is happy to help you.
Get Outlook for iOShttps://aka.ms/o0ukef
From: Carlo Colantuoni notifications@github.com Sent: Saturday, June 20, 2020 1:54:14 PM To: nemoarchive/analytics analytics@noreply.github.com Cc: hertzron hertzron@gmail.com; Comment comment@noreply.github.com Subject: Re: [nemoarchive/analytics] cant upload to NeMO for cortical development profile (#114)
sorry for the formatting on that msg - here is the same text parsed into readable sentences:
i am editing the CarloTEMP profile in NeMO Curator account - when ready this should be moved up to be the default profile for the Cortical Development profile. Joshua and i interacted last night and he was able to get the static tSNE visualization to work in a way that i can bring in new data sets in that visualization.
The data im trying to upload now have been in: "/autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/PrimateDevoAtlas" since march, but havent been pulled in, so for this short deadline i am re formatting them all for manual upload. i couldnt get the uploader to work last night. but joshua just fixed the file size limit that was a problem for this.
now its telling me "Oops! No observations sheet found. Expected spreadsheet sheet named 'observations'." but there is an observations tab.
- any idea what to try here?
joshua - can u run the chron job to grab them from the dir with all the data in it?or should we continue to trouble shoot with the manual uploader?
fyi - this is not single cell data. it is laser captured samples on microarray from the allen brain.there are 1093 samples in the smaller one and 1855 in the larger. only 12000 rows/genes.
carlo
— You are receiving this because you commented. Reply to this email directly, view it on GitHub< https://github.com/nemoarchive/analytics/issues/114#issuecomment-647026852>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEFF5X7FG4ZCXXWOPJFC7HDRXTZUNANCNFSM4ODIUURA
.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/114#issuecomment-647027309, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACQZEYQGMP6RY4BLQ6U2LLRXT2ERANCNFSM4ODIUURA .
@carlocolantuoni I just did a quick check for that directory and while the /autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain exists, the "PrimateDevoAtlassince" sub-directory does not exist. Checked our "processing" and "release" areas and same deal.
Sorry guys, there was a typo in the path, just drop the "since" from the end
If we can run the chron that would b great. Does that include the automatic processing the gets things ready for the static tSNE joshua?
Here are 5 sub directories in /autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/PrimateDevoAtlas. Each contains a data set that would be good to get into NeMO for this week.
Hi Carlo, Did you check that the column headers match the convention that Yang mentioned? Best, Ronna
Get Outlook for iOShttps://aka.ms/o0ukef
From: Carlo Colantuoni notifications@github.com Sent: Saturday, June 20, 2020 10:18 PM To: nemoarchive/analytics Cc: hertzron; Comment Subject: Re: [nemoarchive/analytics] cant upload to NeMO for cortical development profile (#114)
Here are 5 sub directories in /autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/PrimateDevoAtlas. Each contains a data set that would be good to get into NeMO for this week.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/nemoarchive/analytics/issues/114#issuecomment-647068965, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEFF5X2UWYHYA3CKAPQX2F3RXVUYBANCNFSM4ODIUURA.
if the cron job can run we will avoid all the reformatting issues, so i think thats the way to go here if we can - joshua - is that possible?
On Sat, Jun 20, 2020 at 10:30 PM hertzron notifications@github.com wrote:
Hi Carlo, Did you check that the column headers match the convention that Yang mentioned? Best, Ronna
Get Outlook for iOShttps://aka.ms/o0ukef
From: Carlo Colantuoni notifications@github.com Sent: Saturday, June 20, 2020 10:18 PM To: nemoarchive/analytics Cc: hertzron; Comment Subject: Re: [nemoarchive/analytics] cant upload to NeMO for cortical development profile (#114)
Here are 5 sub directories in /autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/PrimateDevoAtlas. Each contains a data set that would be good to get into NeMO for this week.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub< https://github.com/nemoarchive/analytics/issues/114#issuecomment-647068965>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEFF5X2UWYHYA3CKAPQX2F3RXVUYBANCNFSM4ODIUURA
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/114#issuecomment-647069676, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7VPT4KFQAQBPRWP3GDRXVWFTANCNFSM4ODIUURA .
-- Carlo
@adkinsrs I have only run the google-cloud side after @apaala runs the IGS-side script here:
https://github.com/nemoarchive/analytics/blob/master/cron_uploader/nemo_upload_crawler.py
In the code there it references a config file I'm not familiar with:
conf_loc = os.path.join(os.path.dirname(__file__), '.conf.ini')
if not os.path.isfile(conf_loc):
sys.exit("Config file could not be found at {}".format(conf_loc))
Have you run this before or know the values that should be here?
@jorvis Yes, the .conf_ini file was a editable configuration file that I added to the path of the cron_uploader on the IGS filesystem. I previously noticed that nemo_upload_crawler.py had some hardcoded things, and nemo_gcloud_processor.py shared many duplicate things to the former script.
However due to security concerns I did not commit the .conf_ini file to github. Where are you running the script from? If you are not running it over the IGS servers, I can paste the configuration for you on slack or elsewhere?
I was running it on IGS servers. Sending a copy in Slack is fine.
On Sun, Jun 21, 2020, 7:34 AM Shaun Adkins notifications@github.com wrote:
@jorvis https://github.com/jorvis Yes, the .conf_ini file was a editable configuration file that I added to the path of the cron_uploader on the IGS filesystem. I previously noticed that nemo_upload_crawler.py had some hardcoded things, and nemo_gcloud_processor.py shared many duplicate things to the former script.
However due to security concerns I did not commit the .conf_ini file to github. Where are you running the script from? If you are not running it over the IGS servers, I can paste the configuration for you on slack or elsewhere?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/114#issuecomment-647122353, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACQZE7EGJU3YRPOBW3OX43RXX46XANCNFSM4ODIUURA .
Thanks for stayin on this guys!
On Sun, Jun 21, 2020, 09:29 Joshua Orvis notifications@github.com wrote:
I was running it on IGS servers. Sending a copy in Slack is fine.
On Sun, Jun 21, 2020, 7:34 AM Shaun Adkins notifications@github.com wrote:
@jorvis https://github.com/jorvis Yes, the .conf_ini file was a editable configuration file that I added to the path of the cron_uploader on the IGS filesystem. I previously noticed that nemo_upload_crawler.py had some hardcoded things, and nemo_gcloud_processor.py shared many duplicate things to the former script.
However due to security concerns I did not commit the .conf_ini file to github. Where are you running the script from? If you are not running it over the IGS servers, I can paste the configuration for you on slack or elsewhere?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/nemoarchive/analytics/issues/114#issuecomment-647122353 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AACQZE7EGJU3YRPOBW3OX43RXX46XANCNFSM4ODIUURA
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/114#issuecomment-647128647, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7TWAROAVU6VJN2PRKTRXYDL5ANCNFSM4ODIUURA .
@carlocolantuoni Shaun and I looked here but the directory you referenced doesn't seem to exist:
/autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/PrimateDevoAtlas
I dont kno what im getting wrong in the path. Its gotta b under:
autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/
Shaun didnt u say u got into this dir? Wats in there? All there dirs in there are NeMO data.
I will chk wen im bak at a cpu in 30min.
Carlo
@carlocolantuoni
When I do ls -l /autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/
I get:
drwxr-sr-x. 2 ccolantuoni nemo 568 Mar 29 02:39 BrainSpanBulkDevo
drwxr-sr-x. 2 ccolantuoni nemo 2350 Jun 20 17:19 NHPforeBrainDevoAllenLMD.AllRegion
drwxr-sr-x. 2 ccolantuoni nemo 483 Mar 31 02:32 NHPforeBrainDevoAllenLMD.AmygThal
drwxr-sr-x. 2 ccolantuoni nemo 599 Jun 20 17:17 NHPforeBrainDevoAllenLMD.CtxDevo
drwxr-sr-x. 2 ccolantuoni nemo 490 Mar 31 02:34 NHPforeBrainDevoAllenLMD.CtxLayers
drwxr-sr-x. 2 ccolantuoni nemo 462 Mar 31 02:35 NHPforeBrainDevoAllenLMD.Hippo
I also did a find /autofs/encrypted/NEMO/incoming/brain -type d -name "* PrimateDevoAtlas*"
and did not come up with any results either.
yes - all those except "BrainSpanBulkDevo" are to go into NeMO ( BrainSpanBulkDevo is in already)
On Sun, Jun 21, 2020 at 7:24 PM Shaun Adkins notifications@github.com wrote:
@carlocolantuoni https://github.com/carlocolantuoni
When I do ls -1 /autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/ I get:
drwxr-sr-x. 2 ccolantuoni nemo 568 Mar 29 02:39 BrainSpanBulkDevo drwxr-sr-x. 2 ccolantuoni nemo 2350 Jun 20 17:19 NHPforeBrainDevoAllenLMD.AllRegion drwxr-sr-x. 2 ccolantuoni nemo 483 Mar 31 02:32 NHPforeBrainDevoAllenLMD.AmygThal drwxr-sr-x. 2 ccolantuoni nemo 599 Jun 20 17:17 NHPforeBrainDevoAllenLMD.CtxDevo drwxr-sr-x. 2 ccolantuoni nemo 490 Mar 31 02:34 NHPforeBrainDevoAllenLMD.CtxLayers drwxr-sr-x. 2 ccolantuoni nemo 462 Mar 31 02:35 NHPforeBrainDevoAllenLMD.Hippo
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/114#issuecomment-647194995, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7U2BS46BTSBH47X6ALRX2JC3ANCNFSM4ODIUURA .
-- Carlo
sorry about the incorrect path all these are to go in to NeMO:
drwxr-sr-x. 2 ccolantuoni nemo 2350 Jun 20 17:19 NHPforeBrainDevoAllenLMD.AllRegion drwxr-sr-x. 2 ccolantuoni nemo 483 Mar 31 02:32 NHPforeBrainDevoAllenLMD.AmygThal drwxr-sr-x. 2 ccolantuoni nemo 599 Jun 20 17:17 NHPforeBrainDevoAllenLMD.CtxDevo drwxr-sr-x. 2 ccolantuoni nemo 490 Mar 31 02:34 NHPforeBrainDevoAllenLMD.CtxLayers drwxr-sr-x. 2 ccolantuoni nemo 462 Mar 31 02:35 NHPforeBrainDevoAllenLMD.Hippo
On Sun, Jun 21, 2020 at 7:31 PM Carlo Colantuoni colantuonicarlo@gmail.com wrote:
yes - all those except "BrainSpanBulkDevo" are to go into NeMO ( BrainSpanBulkDevo is in already)
On Sun, Jun 21, 2020 at 7:24 PM Shaun Adkins notifications@github.com wrote:
@carlocolantuoni https://github.com/carlocolantuoni
When I do ls -1 /autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/ I get:
drwxr-sr-x. 2 ccolantuoni nemo 568 Mar 29 02:39 BrainSpanBulkDevo drwxr-sr-x. 2 ccolantuoni nemo 2350 Jun 20 17:19 NHPforeBrainDevoAllenLMD.AllRegion drwxr-sr-x. 2 ccolantuoni nemo 483 Mar 31 02:32 NHPforeBrainDevoAllenLMD.AmygThal drwxr-sr-x. 2 ccolantuoni nemo 599 Jun 20 17:17 NHPforeBrainDevoAllenLMD.CtxDevo drwxr-sr-x. 2 ccolantuoni nemo 490 Mar 31 02:34 NHPforeBrainDevoAllenLMD.CtxLayers drwxr-sr-x. 2 ccolantuoni nemo 462 Mar 31 02:35 NHPforeBrainDevoAllenLMD.Hippo
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/114#issuecomment-647194995, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7U2BS46BTSBH47X6ALRX2JC3ANCNFSM4ODIUURA .
-- Carlo
-- Carlo
@carlocolantuoni Fortunately the files are already bundled via the NeMO Archive ingest process, so I will do the upload process now. Will post again when it completes.
Great! Thanks! Sorry again for the path run-around.
Hi @carlocolantuoni
Unfortunately none of the 3tab files were able to be uploaded because the EXPmeta.JSON file did not validate against the "Metadata Validator" module in gEAR. Specifically the following fields are required and were left blank in the JSON files:
If you can provide me this information I can populate these fields and rerun
ENSEMBL 19 Ed Lein EdL@alleninstitute.org
Thanks!
Same info fro all the 5 datasets
Hi Carlo, The annotation release refers to the version of the ENSEMBL transcripts (current = v94), rather than to the version of the reference genome. Seth
Thnx - can u put that in shaun? Sori
@seth-ament or @carlocolantuoni I will put in v94 for ENSEMBL if that is fine
yes - thnx
On Sun, Jun 21, 2020 at 9:24 PM Shaun Adkins notifications@github.com wrote:
@seth-ament https://github.com/seth-ament or @carlocolantuoni https://github.com/carlocolantuoni I will put in v94 for ENSEMBL if that is fine
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/114#issuecomment-647215123, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7SKTFSR2P2BORWIG7LRX2XD3ANCNFSM4ODIUURA .
-- Carlo
Hit another snag.
The uploader script is failing validation because the taxon ID (9544 - Macaca mulatta) does not exist in the gEAR database "organism" table. @jorvis We should add that in the database for gEAR and NeMO analytics.
In the meantime I am going to assign it the "organism" table ID of 8. The nemo_upload_crawler.py script validates against a dictionary of organisms to table IDs in the script rather than retrieve the organisms from the gEAR or NeMO Analytics database, so I am going to add this ID to the dictionary so this will validate.
I think we can use homo sapiens here - the gene symbols and ensmbl gene ids i am using are human, so go ahead and change it.
On Sun, Jun 21, 2020, 21:45 Shaun Adkins notifications@github.com wrote:
Hit another snag.
The uploader script is failing validation because the taxon ID (9544 - Macaca mulatta) does not exist in the gEAR database "organism" table. @jorvis https://github.com/jorvis We should add that in the database for gEAR and NeMO analytics.
In the meantime I am going to assign it the "organism" table ID of 8. The nemo_upload_crawler.py script validates against a dictionary of organisms to table IDs in the script rather than retrieve the organisms from the gEAR or NeMO Analytics database, so I am going to add this ID to the dictionary so this will validate.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/114#issuecomment-647220516, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7X3ETI2KOGKGNROIWLRX2ZTLANCNFSM4ODIUURA .
Update:
I am still have difficulties uploading the data. The data can be converted into H5AD but one of the steps in reading the H5AD files is failing because it thinks that the DataMTX.tab file has 1 more gene than the ROWmeta.tab file. I believe it is trying to erroneously index the "DataRowNames" as a gene, but unsure how to resolve this.
It's honestly getting very close for me to go to bed, so I don't know how much longer I can go at this.
thanks shaun - go to bed - ill work on other routes to get it and others in
Thanks for all your help, Shaun! Agreed -- get some sleep!
@carlocolantuoni fine to get these initial datasets uploaded by another approach as a "quick fix", but I don't want to give up on finishing the work that Shaun and Joshua have been helping with this weekend to get the data automatically uploaded. We will need to work through these issues in order to build out the integration of NeMO Archive and Analytics, and that's pretty central to the project.
I think I may have gotten it to go through! Made an AnnData dataframe subview without the troublesome row, and the H5AD file was created and stuff is now uploading to @jorvis's GCP bucket.
Will have to do some kind of rewrite on that part of the script to utilize dataframe subviews to avoid a similar error in the future.
Fantastic, Shaun!
u r a hero @adkinsrs i totally agree @seth-ament
is there another step once things are in the GCP bucket? i am not seeing the data in NeMO curator when i search in the data set manager.
I have to process them from the bucket. Working on it.
guys, thanks again for hittin this so much today
I think there’s still one step that Joshua has to do manually to make it available to curate. Hopefully he can get to that in the morning.
Seth A Ament, PhD Assistant Professor Institute for Genome Sciences Department of Psychiatry University of Maryland School of Medicine Baltimore, Maryland
From: Carlo Colantuoni notifications@github.com Sent: Monday, June 22, 2020 1:05:49 AM To: nemoarchive/analytics analytics@noreply.github.com Cc: Ament, Seth SAment@som.umaryland.edu; Mention mention@noreply.github.com Subject: Re: [nemoarchive/analytics] cant upload to NeMO for cortical development profile (#114)
guys, thanks again for hittin this so much today
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/nemoarchive/analytics/issues/114#issuecomment-647279322, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABXN5DWQNZGKBCUQWLLHWRLRX3RC3ANCNFSM4ODIUURA.
Shaun and I are working on it.
Carlo should be able to see them now:
mysql> select id, date_added, title, is_public from dataset where owner_id = 483;
+--------------------------------------+---------------------+------------------------------------+-----------+
| id | date_added | title | is_public |
+--------------------------------------+---------------------+------------------------------------+-----------+
| 0017f74a-924a-4e55-86f6-674b5d98d5e6 | 2020-06-22 14:57:58 | NHPforeBrainDevoAllenLMD.Hippo | 1 |
| 61f42eef-30f7-4263-b630-6e78bdeb47e0 | 2020-06-22 15:21:37 | NHPforeBrainDevoAllenLMD.CtxLayers | 1 |
| 8319fdef-5aea-4af8-b1ac-fc4b59b5ceef | 2020-06-22 15:21:38 | NHPforeBrainDevoAllenLMD.AllRegion | 1 |
| d83a2fda-7806-4ddf-ba92-e46317bbb998 | 2020-06-22 15:21:39 | NHPforeBrainDevoAllenLMD.CtxDevo | 1 |
| e56788e1-eb7b-48ce-a22d-dde67ac281f0 | 2020-06-22 15:21:41 | NHPforeBrainDevoAllenLMD.AmygThal | 1 |
+--------------------------------------+---------------------+------------------------------------+-----------+
5 rows in set (0.00 sec)
Any luck finding where the observations were lost?
Not yet. Will continue after out pre-meeting meeting in 17 minutes.
Carlo, please try the dataset now which ends with 'Hippo'. The issue was the all the obs column names are numeric, which scanpy/pandas balks on. I created a utility to correct these and then used other steps to manually fix and upload the new h5. If this one works for you I'll do the same on the other four. My notes:
$ cut -f 1,2,3,4,6- NHPforeBrainDevoAllenLMD.Hippo_ROWmeta.tab > foo
$ mv foo NHPforeBrainDevoAllenLMD.Hippo_ROWmeta.tab
$ ~/git/gEAR/bin/h5ad_fix_numeric_headers.py -i ./ -f NHPforeBrainDevoAllenLMD.Hippo -c X
$ ~/git/gEAR/bin/h5ad_convert_from_3tab.py -i ./ -f NHPforeBrainDevoAllenLMD.Hippo -o ./NHPforeBrainDevoAllenLMD.Hippo.h5ad
$ ~/git/gEAR/bin/add_ensembl_ids_to_h5ad.py -i NHPforeBrainDevoAllenLMD.Hippo.h5ad -o 0017f74a-924a-4e55-86f6-674b5d98d5e6.h5ad -org 2 -er 92
$ gcloud compute scp 0017f74a-924a-4e55-86f6-674b5d98d5e6.h5ad nemo-prod-202006:/home/jorvis/git/gEAR/www/datasets/
thanks shaun and joshua for all the running around on this - im making views now!
Closing. Please re-open if there are issues.
after getting the meta data in, during the expression file upload: i have tried to upload 2 different files repeatedly and am getting: "Oops! File upload failed. Try again and contact us if this continues." one was ~270MB, the other ~170MB - is there a timeout? or size limit? something else?