nemoarchive / analytics

Repository for the NeMO Analytics project.
MIT License
1 stars 0 forks source link

cant upload to NeMO for cortical development profile #114

Closed carlocolantuoni closed 4 years ago

carlocolantuoni commented 4 years ago

after getting the meta data in, during the expression file upload: i have tried to upload 2 different files repeatedly and am getting: "Oops! File upload failed. Try again and contact us if this continues." one was ~270MB, the other ~170MB - is there a timeout? or size limit? something else?

hertzron commented 4 years ago

Carlo, how many cells in each of these datasets? As a test can you keep the cell number but limit the metadata to the minimal necessary for your displays and analyses?

Get Outlook for iOShttps://aka.ms/o0ukef


From: Carlo Colantuoni notifications@github.com Sent: Saturday, June 20, 2020 3:37:34 AM To: nemoarchive/analytics analytics@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [nemoarchive/analytics] cant upload to NeMO for cortical development profile (#114)

after getting the meta data in, during the expression file upload: i have tried to upload 2 different files repeatedly and am getting: "Oops! File upload failed. Try again and contact us if this continues." one was ~270MB, the other ~170MB - is there a timeout? or size limit? something else?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/nemoarchive/analytics/issues/114, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEFF5X44DH6TS2P2JDNLWHTRXRRL5ANCNFSM4ODIUURA.

jorvis commented 4 years ago

Carlo - when we built the new server we failed to increase the file size upload limit initially on the web server. I have fixed this and did a quick demo upload. Please try again.

carlocolantuoni commented 4 years ago

i am editing the CarloTEMP profile in NeMO Curator account - when ready this should be moved up to be the default profile for the Cortical Development profile.Joshua and i interacted last night and he was able to get the static tSNE visualization to work in a way that i can bring in new data sets in that visualization. these data have been in: /autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/PrimateDevoAtlassince march, but havent been pulled in, so for this short deadline i am re formatting them all for manual upload.i couldnt get the uploader to work last night. joshua just fixed the file size limit that was a problem for this. now its telling me "Oops! No observations sheet found. Expected spreadsheet sheet named 'observations'."but there is an observations tab. - any idea what to try here? joshua - can u run the chron job to grab them from the dir with all the data in it?or should we continue to trouble shoot with the manual uploader?

fyi - this is not single cell data. it is laser captured samples on microarray from the allen brain.there are 1093 samples in the smaller one and 1855 in the larger. only 12000 rows/genes. -- Carlo

carlocolantuoni commented 4 years ago

sorry for the formatting on that msg - here is the same text parsed into readable sentences:

i am editing the CarloTEMP profile in NeMO Curator account - when ready this should be moved up to be the default profile for the Cortical Development profile. Joshua and i interacted last night and he was able to get the static tSNE visualization to work in a way that i can bring in new data sets in that visualization.

The data im trying to upload now have been in: "/autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/PrimateDevoAtlas" since march, but havent been pulled in, so for this short deadline i am re formatting them all for manual upload. i couldnt get the uploader to work last night. but joshua just fixed the file size limit that was a problem for this.

now its telling me "Oops! No observations sheet found. Expected spreadsheet sheet named 'observations'." but there is an observations tab. - any idea what to try here?

joshua - can u run the chron job to grab them from the dir with all the data in it?or should we continue to trouble shoot with the manual uploader?

fyi - this is not single cell data. it is laser captured samples on microarray from the allen brain.there are 1093 samples in the smaller one and 1855 in the larger. only 12000 rows/genes.

carlo

hertzron commented 4 years ago

Carlo, Yang is happy to help you.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Carlo Colantuoni notifications@github.com Sent: Saturday, June 20, 2020 1:54:14 PM To: nemoarchive/analytics analytics@noreply.github.com Cc: hertzron hertzron@gmail.com; Comment comment@noreply.github.com Subject: Re: [nemoarchive/analytics] cant upload to NeMO for cortical development profile (#114)

sorry for the formatting on that msg - here is the same text parsed into readable sentences:

i am editing the CarloTEMP profile in NeMO Curator account - when ready this should be moved up to be the default profile for the Cortical Development profile. Joshua and i interacted last night and he was able to get the static tSNE visualization to work in a way that i can bring in new data sets in that visualization.

The data im trying to upload now have been in: "/autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/PrimateDevoAtlas" since march, but havent been pulled in, so for this short deadline i am re formatting them all for manual upload. i couldnt get the uploader to work last night. but joshua just fixed the file size limit that was a problem for this.

now its telling me "Oops! No observations sheet found. Expected spreadsheet sheet named 'observations'." but there is an observations tab. - any idea what to try here?

joshua - can u run the chron job to grab them from the dir with all the data in it?or should we continue to trouble shoot with the manual uploader?

fyi - this is not single cell data. it is laser captured samples on microarray from the allen brain.there are 1093 samples in the smaller one and 1855 in the larger. only 12000 rows/genes.

carlo

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/nemoarchive/analytics/issues/114#issuecomment-647026852, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEFF5X7FG4ZCXXWOPJFC7HDRXTZUNANCNFSM4ODIUURA.

jorvis commented 4 years ago

Do I need need to run the NeMO uploader chron? If so I can do that, but don't know if whatever datasets you have put there have made it through the indexing steps which make it visible in the files the chron reads. Shaun would have to help check there.

JO

On Sat, Jun 20, 2020 at 12:58 PM hertzron notifications@github.com wrote:

Carlo, Yang is happy to help you.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Carlo Colantuoni notifications@github.com Sent: Saturday, June 20, 2020 1:54:14 PM To: nemoarchive/analytics analytics@noreply.github.com Cc: hertzron hertzron@gmail.com; Comment comment@noreply.github.com Subject: Re: [nemoarchive/analytics] cant upload to NeMO for cortical development profile (#114)

sorry for the formatting on that msg - here is the same text parsed into readable sentences:

i am editing the CarloTEMP profile in NeMO Curator account - when ready this should be moved up to be the default profile for the Cortical Development profile. Joshua and i interacted last night and he was able to get the static tSNE visualization to work in a way that i can bring in new data sets in that visualization.

The data im trying to upload now have been in: "/autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/PrimateDevoAtlas" since march, but havent been pulled in, so for this short deadline i am re formatting them all for manual upload. i couldnt get the uploader to work last night. but joshua just fixed the file size limit that was a problem for this.

now its telling me "Oops! No observations sheet found. Expected spreadsheet sheet named 'observations'." but there is an observations tab.

  • any idea what to try here?

joshua - can u run the chron job to grab them from the dir with all the data in it?or should we continue to trouble shoot with the manual uploader?

fyi - this is not single cell data. it is laser captured samples on microarray from the allen brain.there are 1093 samples in the smaller one and 1855 in the larger. only 12000 rows/genes.

carlo

— You are receiving this because you commented. Reply to this email directly, view it on GitHub< https://github.com/nemoarchive/analytics/issues/114#issuecomment-647026852>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEFF5X7FG4ZCXXWOPJFC7HDRXTZUNANCNFSM4ODIUURA

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/114#issuecomment-647027309, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACQZEYQGMP6RY4BLQ6U2LLRXT2ERANCNFSM4ODIUURA .

adkinsrs commented 4 years ago

@carlocolantuoni I just did a quick check for that directory and while the /autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain exists, the "PrimateDevoAtlassince" sub-directory does not exist. Checked our "processing" and "release" areas and same deal.

carlocolantuoni commented 4 years ago

Sorry guys, there was a typo in the path, just drop the "since" from the end

carlocolantuoni commented 4 years ago

If we can run the chron that would b great. Does that include the automatic processing the gets things ready for the static tSNE joshua?

carlocolantuoni commented 4 years ago

Here are 5 sub directories in /autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/PrimateDevoAtlas. Each contains a data set that would be good to get into NeMO for this week.

hertzron commented 4 years ago

Hi Carlo, Did you check that the column headers match the convention that Yang mentioned? Best, Ronna

Get Outlook for iOShttps://aka.ms/o0ukef


From: Carlo Colantuoni notifications@github.com Sent: Saturday, June 20, 2020 10:18 PM To: nemoarchive/analytics Cc: hertzron; Comment Subject: Re: [nemoarchive/analytics] cant upload to NeMO for cortical development profile (#114)

Here are 5 sub directories in /autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/PrimateDevoAtlas. Each contains a data set that would be good to get into NeMO for this week.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/nemoarchive/analytics/issues/114#issuecomment-647068965, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEFF5X2UWYHYA3CKAPQX2F3RXVUYBANCNFSM4ODIUURA.

carlocolantuoni commented 4 years ago

if the cron job can run we will avoid all the reformatting issues, so i think thats the way to go here if we can - joshua - is that possible?

On Sat, Jun 20, 2020 at 10:30 PM hertzron notifications@github.com wrote:

Hi Carlo, Did you check that the column headers match the convention that Yang mentioned? Best, Ronna

Get Outlook for iOShttps://aka.ms/o0ukef


From: Carlo Colantuoni notifications@github.com Sent: Saturday, June 20, 2020 10:18 PM To: nemoarchive/analytics Cc: hertzron; Comment Subject: Re: [nemoarchive/analytics] cant upload to NeMO for cortical development profile (#114)

Here are 5 sub directories in /autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/PrimateDevoAtlas. Each contains a data set that would be good to get into NeMO for this week.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub< https://github.com/nemoarchive/analytics/issues/114#issuecomment-647068965>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEFF5X2UWYHYA3CKAPQX2F3RXVUYBANCNFSM4ODIUURA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/114#issuecomment-647069676, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7VPT4KFQAQBPRWP3GDRXVWFTANCNFSM4ODIUURA .

-- Carlo

jorvis commented 4 years ago

@adkinsrs I have only run the google-cloud side after @apaala runs the IGS-side script here:

https://github.com/nemoarchive/analytics/blob/master/cron_uploader/nemo_upload_crawler.py

In the code there it references a config file I'm not familiar with:

conf_loc = os.path.join(os.path.dirname(__file__), '.conf.ini')
if not os.path.isfile(conf_loc):
    sys.exit("Config file could not be found at {}".format(conf_loc))

Have you run this before or know the values that should be here?

adkinsrs commented 4 years ago

@jorvis Yes, the .conf_ini file was a editable configuration file that I added to the path of the cron_uploader on the IGS filesystem. I previously noticed that nemo_upload_crawler.py had some hardcoded things, and nemo_gcloud_processor.py shared many duplicate things to the former script.

However due to security concerns I did not commit the .conf_ini file to github. Where are you running the script from? If you are not running it over the IGS servers, I can paste the configuration for you on slack or elsewhere?

jorvis commented 4 years ago

I was running it on IGS servers. Sending a copy in Slack is fine.

On Sun, Jun 21, 2020, 7:34 AM Shaun Adkins notifications@github.com wrote:

@jorvis https://github.com/jorvis Yes, the .conf_ini file was a editable configuration file that I added to the path of the cron_uploader on the IGS filesystem. I previously noticed that nemo_upload_crawler.py had some hardcoded things, and nemo_gcloud_processor.py shared many duplicate things to the former script.

However due to security concerns I did not commit the .conf_ini file to github. Where are you running the script from? If you are not running it over the IGS servers, I can paste the configuration for you on slack or elsewhere?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/114#issuecomment-647122353, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACQZE7EGJU3YRPOBW3OX43RXX46XANCNFSM4ODIUURA .

carlocolantuoni commented 4 years ago

Thanks for stayin on this guys!

On Sun, Jun 21, 2020, 09:29 Joshua Orvis notifications@github.com wrote:

I was running it on IGS servers. Sending a copy in Slack is fine.

On Sun, Jun 21, 2020, 7:34 AM Shaun Adkins notifications@github.com wrote:

@jorvis https://github.com/jorvis Yes, the .conf_ini file was a editable configuration file that I added to the path of the cron_uploader on the IGS filesystem. I previously noticed that nemo_upload_crawler.py had some hardcoded things, and nemo_gcloud_processor.py shared many duplicate things to the former script.

However due to security concerns I did not commit the .conf_ini file to github. Where are you running the script from? If you are not running it over the IGS servers, I can paste the configuration for you on slack or elsewhere?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/nemoarchive/analytics/issues/114#issuecomment-647122353 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AACQZE7EGJU3YRPOBW3OX43RXX46XANCNFSM4ODIUURA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/114#issuecomment-647128647, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7TWAROAVU6VJN2PRKTRXYDL5ANCNFSM4ODIUURA .

jorvis commented 4 years ago

@carlocolantuoni Shaun and I looked here but the directory you referenced doesn't seem to exist:

/autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/PrimateDevoAtlas

carlocolantuoni commented 4 years ago

I dont kno what im getting wrong in the path. Its gotta b under:

autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/

Shaun didnt u say u got into this dir? Wats in there? All there dirs in there are NeMO data.

I will chk wen im bak at a cpu in 30min.

Carlo

adkinsrs commented 4 years ago

@carlocolantuoni

When I do ls -l /autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/ I get:

drwxr-sr-x. 2 ccolantuoni nemo  568 Mar 29 02:39 BrainSpanBulkDevo
drwxr-sr-x. 2 ccolantuoni nemo 2350 Jun 20 17:19 NHPforeBrainDevoAllenLMD.AllRegion
drwxr-sr-x. 2 ccolantuoni nemo  483 Mar 31 02:32 NHPforeBrainDevoAllenLMD.AmygThal
drwxr-sr-x. 2 ccolantuoni nemo  599 Jun 20 17:17 NHPforeBrainDevoAllenLMD.CtxDevo
drwxr-sr-x. 2 ccolantuoni nemo  490 Mar 31 02:34 NHPforeBrainDevoAllenLMD.CtxLayers
drwxr-sr-x. 2 ccolantuoni nemo  462 Mar 31 02:35 NHPforeBrainDevoAllenLMD.Hippo

I also did a find /autofs/encrypted/NEMO/incoming/brain -type d -name "* PrimateDevoAtlas*" and did not come up with any results either.

carlocolantuoni commented 4 years ago

yes - all those except "BrainSpanBulkDevo" are to go into NeMO ( BrainSpanBulkDevo is in already)

On Sun, Jun 21, 2020 at 7:24 PM Shaun Adkins notifications@github.com wrote:

@carlocolantuoni https://github.com/carlocolantuoni

When I do ls -1 /autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/ I get:

drwxr-sr-x. 2 ccolantuoni nemo 568 Mar 29 02:39 BrainSpanBulkDevo drwxr-sr-x. 2 ccolantuoni nemo 2350 Jun 20 17:19 NHPforeBrainDevoAllenLMD.AllRegion drwxr-sr-x. 2 ccolantuoni nemo 483 Mar 31 02:32 NHPforeBrainDevoAllenLMD.AmygThal drwxr-sr-x. 2 ccolantuoni nemo 599 Jun 20 17:17 NHPforeBrainDevoAllenLMD.CtxDevo drwxr-sr-x. 2 ccolantuoni nemo 490 Mar 31 02:34 NHPforeBrainDevoAllenLMD.CtxLayers drwxr-sr-x. 2 ccolantuoni nemo 462 Mar 31 02:35 NHPforeBrainDevoAllenLMD.Hippo

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/114#issuecomment-647194995, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7U2BS46BTSBH47X6ALRX2JC3ANCNFSM4ODIUURA .

-- Carlo

carlocolantuoni commented 4 years ago

sorry about the incorrect path all these are to go in to NeMO:

drwxr-sr-x. 2 ccolantuoni nemo 2350 Jun 20 17:19 NHPforeBrainDevoAllenLMD.AllRegion drwxr-sr-x. 2 ccolantuoni nemo 483 Mar 31 02:32 NHPforeBrainDevoAllenLMD.AmygThal drwxr-sr-x. 2 ccolantuoni nemo 599 Jun 20 17:17 NHPforeBrainDevoAllenLMD.CtxDevo drwxr-sr-x. 2 ccolantuoni nemo 490 Mar 31 02:34 NHPforeBrainDevoAllenLMD.CtxLayers drwxr-sr-x. 2 ccolantuoni nemo 462 Mar 31 02:35 NHPforeBrainDevoAllenLMD.Hippo

On Sun, Jun 21, 2020 at 7:31 PM Carlo Colantuoni colantuonicarlo@gmail.com wrote:

yes - all those except "BrainSpanBulkDevo" are to go into NeMO ( BrainSpanBulkDevo is in already)

On Sun, Jun 21, 2020 at 7:24 PM Shaun Adkins notifications@github.com wrote:

@carlocolantuoni https://github.com/carlocolantuoni

When I do ls -1 /autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/ I get:

drwxr-sr-x. 2 ccolantuoni nemo 568 Mar 29 02:39 BrainSpanBulkDevo drwxr-sr-x. 2 ccolantuoni nemo 2350 Jun 20 17:19 NHPforeBrainDevoAllenLMD.AllRegion drwxr-sr-x. 2 ccolantuoni nemo 483 Mar 31 02:32 NHPforeBrainDevoAllenLMD.AmygThal drwxr-sr-x. 2 ccolantuoni nemo 599 Jun 20 17:17 NHPforeBrainDevoAllenLMD.CtxDevo drwxr-sr-x. 2 ccolantuoni nemo 490 Mar 31 02:34 NHPforeBrainDevoAllenLMD.CtxLayers drwxr-sr-x. 2 ccolantuoni nemo 462 Mar 31 02:35 NHPforeBrainDevoAllenLMD.Hippo

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/114#issuecomment-647194995, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7U2BS46BTSBH47X6ALRX2JC3ANCNFSM4ODIUURA .

-- Carlo

-- Carlo

adkinsrs commented 4 years ago

@carlocolantuoni Fortunately the files are already bundled via the NeMO Archive ingest process, so I will do the upload process now. Will post again when it completes.

carlocolantuoni commented 4 years ago

Great! Thanks! Sorry again for the path run-around.

adkinsrs commented 4 years ago

Hi @carlocolantuoni

Unfortunately none of the 3tab files were able to be uploaded because the EXPmeta.JSON file did not validate against the "Metadata Validator" module in gEAR. Specifically the following fields are required and were left blank in the JSON files:

If you can provide me this information I can populate these fields and rerun

carlocolantuoni commented 4 years ago

ENSEMBL 19 Ed Lein EdL@alleninstitute.org

carlocolantuoni commented 4 years ago

Thanks!

carlocolantuoni commented 4 years ago

Same info fro all the 5 datasets

seth-ament commented 4 years ago

Hi Carlo, The annotation release refers to the version of the ENSEMBL transcripts (current = v94), rather than to the version of the reference genome. Seth

carlocolantuoni commented 4 years ago

Thnx - can u put that in shaun? Sori

adkinsrs commented 4 years ago

@seth-ament or @carlocolantuoni I will put in v94 for ENSEMBL if that is fine

carlocolantuoni commented 4 years ago

yes - thnx

On Sun, Jun 21, 2020 at 9:24 PM Shaun Adkins notifications@github.com wrote:

@seth-ament https://github.com/seth-ament or @carlocolantuoni https://github.com/carlocolantuoni I will put in v94 for ENSEMBL if that is fine

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/114#issuecomment-647215123, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7SKTFSR2P2BORWIG7LRX2XD3ANCNFSM4ODIUURA .

-- Carlo

adkinsrs commented 4 years ago

Hit another snag.

The uploader script is failing validation because the taxon ID (9544 - Macaca mulatta) does not exist in the gEAR database "organism" table. @jorvis We should add that in the database for gEAR and NeMO analytics.

In the meantime I am going to assign it the "organism" table ID of 8. The nemo_upload_crawler.py script validates against a dictionary of organisms to table IDs in the script rather than retrieve the organisms from the gEAR or NeMO Analytics database, so I am going to add this ID to the dictionary so this will validate.

carlocolantuoni commented 4 years ago

I think we can use homo sapiens here - the gene symbols and ensmbl gene ids i am using are human, so go ahead and change it.

On Sun, Jun 21, 2020, 21:45 Shaun Adkins notifications@github.com wrote:

Hit another snag.

The uploader script is failing validation because the taxon ID (9544 - Macaca mulatta) does not exist in the gEAR database "organism" table. @jorvis https://github.com/jorvis We should add that in the database for gEAR and NeMO analytics.

In the meantime I am going to assign it the "organism" table ID of 8. The nemo_upload_crawler.py script validates against a dictionary of organisms to table IDs in the script rather than retrieve the organisms from the gEAR or NeMO Analytics database, so I am going to add this ID to the dictionary so this will validate.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/114#issuecomment-647220516, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7X3ETI2KOGKGNROIWLRX2ZTLANCNFSM4ODIUURA .

adkinsrs commented 4 years ago

Update:

I am still have difficulties uploading the data. The data can be converted into H5AD but one of the steps in reading the H5AD files is failing because it thinks that the DataMTX.tab file has 1 more gene than the ROWmeta.tab file. I believe it is trying to erroneously index the "DataRowNames" as a gene, but unsure how to resolve this.

It's honestly getting very close for me to go to bed, so I don't know how much longer I can go at this.

carlocolantuoni commented 4 years ago

thanks shaun - go to bed - ill work on other routes to get it and others in

seth-ament commented 4 years ago

Thanks for all your help, Shaun! Agreed -- get some sleep!

@carlocolantuoni fine to get these initial datasets uploaded by another approach as a "quick fix", but I don't want to give up on finishing the work that Shaun and Joshua have been helping with this weekend to get the data automatically uploaded. We will need to work through these issues in order to build out the integration of NeMO Archive and Analytics, and that's pretty central to the project.

adkinsrs commented 4 years ago

I think I may have gotten it to go through! Made an AnnData dataframe subview without the troublesome row, and the H5AD file was created and stuff is now uploading to @jorvis's GCP bucket.

Will have to do some kind of rewrite on that part of the script to utilize dataframe subviews to avoid a similar error in the future.

seth-ament commented 4 years ago

Fantastic, Shaun!

carlocolantuoni commented 4 years ago

u r a hero @adkinsrs i totally agree @seth-ament

carlocolantuoni commented 4 years ago

is there another step once things are in the GCP bucket? i am not seeing the data in NeMO curator when i search in the data set manager.

jorvis commented 4 years ago

I have to process them from the bucket. Working on it.

carlocolantuoni commented 4 years ago

guys, thanks again for hittin this so much today

seth-ament commented 4 years ago

I think there’s still one step that Joshua has to do manually to make it available to curate. Hopefully he can get to that in the morning.

Seth A Ament, PhD Assistant Professor Institute for Genome Sciences Department of Psychiatry University of Maryland School of Medicine Baltimore, Maryland


From: Carlo Colantuoni notifications@github.com Sent: Monday, June 22, 2020 1:05:49 AM To: nemoarchive/analytics analytics@noreply.github.com Cc: Ament, Seth SAment@som.umaryland.edu; Mention mention@noreply.github.com Subject: Re: [nemoarchive/analytics] cant upload to NeMO for cortical development profile (#114)

guys, thanks again for hittin this so much today

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/nemoarchive/analytics/issues/114#issuecomment-647279322, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABXN5DWQNZGKBCUQWLLHWRLRX3RC3ANCNFSM4ODIUURA.

jorvis commented 4 years ago

Shaun and I are working on it.

jorvis commented 4 years ago

Carlo should be able to see them now:

mysql> select id, date_added, title, is_public from dataset where owner_id = 483;
+--------------------------------------+---------------------+------------------------------------+-----------+
| id                                   | date_added          | title                              | is_public |
+--------------------------------------+---------------------+------------------------------------+-----------+
| 0017f74a-924a-4e55-86f6-674b5d98d5e6 | 2020-06-22 14:57:58 | NHPforeBrainDevoAllenLMD.Hippo     |         1 |
| 61f42eef-30f7-4263-b630-6e78bdeb47e0 | 2020-06-22 15:21:37 | NHPforeBrainDevoAllenLMD.CtxLayers |         1 |
| 8319fdef-5aea-4af8-b1ac-fc4b59b5ceef | 2020-06-22 15:21:38 | NHPforeBrainDevoAllenLMD.AllRegion |         1 |
| d83a2fda-7806-4ddf-ba92-e46317bbb998 | 2020-06-22 15:21:39 | NHPforeBrainDevoAllenLMD.CtxDevo   |         1 |
| e56788e1-eb7b-48ce-a22d-dde67ac281f0 | 2020-06-22 15:21:41 | NHPforeBrainDevoAllenLMD.AmygThal  |         1 |
+--------------------------------------+---------------------+------------------------------------+-----------+
5 rows in set (0.00 sec)
carlocolantuoni commented 4 years ago

Any luck finding where the observations were lost?

jorvis commented 4 years ago

Not yet. Will continue after out pre-meeting meeting in 17 minutes.

jorvis commented 4 years ago

Carlo, please try the dataset now which ends with 'Hippo'. The issue was the all the obs column names are numeric, which scanpy/pandas balks on. I created a utility to correct these and then used other steps to manually fix and upload the new h5. If this one works for you I'll do the same on the other four. My notes:

$ cut -f 1,2,3,4,6- NHPforeBrainDevoAllenLMD.Hippo_ROWmeta.tab > foo
$ mv foo NHPforeBrainDevoAllenLMD.Hippo_ROWmeta.tab
$ ~/git/gEAR/bin/h5ad_fix_numeric_headers.py -i ./ -f NHPforeBrainDevoAllenLMD.Hippo -c X
$ ~/git/gEAR/bin/h5ad_convert_from_3tab.py -i ./ -f NHPforeBrainDevoAllenLMD.Hippo -o ./NHPforeBrainDevoAllenLMD.Hippo.h5ad
$ ~/git/gEAR/bin/add_ensembl_ids_to_h5ad.py -i NHPforeBrainDevoAllenLMD.Hippo.h5ad -o 0017f74a-924a-4e55-86f6-674b5d98d5e6.h5ad -org 2 -er 92
$ gcloud compute scp 0017f74a-924a-4e55-86f6-674b5d98d5e6.h5ad nemo-prod-202006:/home/jorvis/git/gEAR/www/datasets/
carlocolantuoni commented 4 years ago

thanks shaun and joshua for all the running around on this - im making views now!

jorvis commented 4 years ago

Closing. Please re-open if there are issues.