October Release Testing Feedback

ssarrafan commented 3 years ago

This issue is to document and track testing feedback and fixes in one place

ssarrafan commented 3 years ago

Testing feedback from today is noted in the testing worksheet and there is a Google doc with screenshots and other details.

[x] Two Gold IDs for all samples in Brodie study (already reported on Slack)
[x] Spruce data isn’t sub-setting properly, see row 6 of testing spreadsheet and page 1 of Google doc for screenshot
[x] Faceted search is not working the subsetting is not working for the Spruce data (PI name Schadt)
[x] Funding sources, needs to be a whole block of text, not a single line, see page 2 of Google doc for screenshot
[x] Can you please add a PI image on the study page for the new Spruce study. Photo on page 3 of Google doc.
[x] Link to user guide at the top of the site as discussed

subdavis commented 3 years ago

Thanks, @ssarrafan, I'm working on these issues now, except for two:

Two Gold IDs for all samples in Brodie study (already reported on Slack)

This is a metadata problem, so I think it needs to be corrected in Mongo. Is that right?

Can you please add a PI image on the study page for the new Spruce study. Photo on page 3 of Google doc.

Profile image is part of the NMDC schema: PersonValue.profile_image_url -- our server doesn't host images. This image will need to be added to the principal_investigator object of the study.

ssarrafan commented 3 years ago

Thanks, @ssarrafan, I'm working on these issues now, except for two:

Two Gold IDs for all samples in Brodie study (already reported on Slack)

This is a metadata problem, so I think it needs to be corrected in Mongo. Is that right?

Can you please add a PI image on the study page for the new Spruce study. Photo on page 3 of Google doc.

Profile image is part of the NMDC schema: PersonValue.profile_image_url -- our server doesn't host images. This image will need to be added to the principal_investigator object of the study.

@dwinston or @dehays can one of you please add Dr. Schadt's photo from the third page of this Google doc to NERSC or wherever the PI images need to live?

dehays commented 3 years ago

I created a Chris Schadt image file and added it to cori.nersc.gov:/global/cfs/cdirs/m3408/www/profile_images

I'll use a change sheet to update the PI image url value on the SPRUCE study.

dehays commented 3 years ago

Mongo has been updated with Chris Schadt profile image url as PI of SPRUCE study. This should be visible after the next portal ingest.

ssarrafan commented 3 years ago

Mongo has been updated with Chris Schadt profile image url as PI of SPRUCE study. This should be visible after the next portal ingest.

Thank you @dehays

subdavis commented 3 years ago

The rest of the items in the list should have been addressed. Please LMK if any other changes are needed.

ssarrafan commented 3 years ago

@dwinston please see GH issue https://github.com/microbiomedata/nmdc-runtime/issues/40

pvangay commented 3 years ago

not sure if this is an artifact of other issues, but the links out to the NCBI/EBI biosample doesn't look correct for at least a few samples i checked. For example, for _Riverbed sediment microbial communities from areas with no vegetation in Columbia River, Washington, USA - GW-RW N1_1020, the Biosample accession should be: SAMN06267121 but currently points to a fish sample. The IMG and GOLD links seem ok.

pvangay commented 3 years ago

5 Brodie samples have a different ENVO classification compared to the rest of Brodie's samples.

Compared to the rest of Brodie's samples, which are classified as: Which I can verify in GOLD (these 5 samples don't exist in GOLD and must be EMSL only samples). From the latest spreadsheet, all Brodie samples have the same GOLD classification - so I would presume the ENVO terms should be identical too... yet it's unclear to me which ENVO classification is correct, or if these 5 samples should just be removed altogether (I thought I remembered we weren't showing EMSL only samples but I could be wrong).

subdavis commented 3 years ago

meta: I can usually tell when something needs my attention, but please tag me by name if there's something specifically for me to do.

ssarrafan commented 3 years ago

@subdavis Here's the document with the screenshots and issues with downloading that Karen reported. I tried the first file that she tried (1781_100351.filtered.fastq.gz.download) and it's still trying to download and appears to be stuck. I20211012_ssues_with_downloading_from_portal_kd.docx

I tried the same thing in production instance and it's much faster.

ssarrafan commented 3 years ago

@subdavis here's a screenshot that shows the difference in how fast it's downloading... the top file is from production and started later and the third one down is from dev and started sooner.

subdavis commented 3 years ago

Thank you for the detailed report. I can reproduce this now.

2021/10/13 01:28:59 [error] 11#11: *113 upstream prematurely closed connection while reading upstream, client: 128.55.212.127, server: localhost, request: "GET /data/1781_100351/qa/1781_100351.filtered.fastq.gz HTTP/1.1", upstream: "http://10.42.8.144:8080/1781_100351/qa/1781_100351.filtered.fastq.gz", host: "data.microbiomedata.org", referrer: "https://data.dev.microbiomedata.org/"

From the data container, I see a 206 return code.

128.55.206.110 - - [13/Oct/2021:01:30:58 +0000] "GET /1781_100351/qa/1781_100351.filtered.fastq.gz HTTP/1.0" 206 871207475 "https://data.microbiomedata.org/?q=ChQIABACGAIiDCJNZXRhZ2Vub21lIg==" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.58 Safari/537.36" "216.63.191.253"

I can see that each request failes (for me) after exactly 1 GB has been downloaded. If the download is resumed, it runs until exactly 2GB is downloaded.

I suspect a proxy configuration issue, I'm going to recruit some help from another member of the team tomorrow.

ssarrafan commented 3 years ago

Thank you for the detailed report. I can reproduce this now.
2021/10/13 01:28:59 [error] 11#11: *113 upstream prematurely closed connection while reading upstream, client: 128.55.212.127, server: localhost, request: "GET /data/1781_100351/qa/1781_100351.filtered.fastq.gz HTTP/1.1", upstream: "http://10.42.8.144:8080/1781_100351/qa/1781_100351.filtered.fastq.gz", host: "data.microbiomedata.org", referrer: "https://data.dev.microbiomedata.org/"
From the data container, I see a 206 return code.
128.55.206.110 - - [13/Oct/2021:01:30:58 +0000] "GET /1781_100351/qa/1781_100351.filtered.fastq.gz HTTP/1.0" 206 871207475 "https://data.microbiomedata.org/?q=ChQIABACGAIiDCJNZXRhZ2Vub21lIg==" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.58 Safari/537.36" "216.63.191.253"
I can see that each request failes (for me) after exactly 1 GB has been downloaded. If the download is resumed, it runs until exactly 2GB is downloaded.

I suspect a proxy configuration issue, I'm going to recruit some help from another member of the team tomorrow.

I just tried to same download from Karen's first example in production and it's working fine and downloaded quickly so the good news is no problems with downloads in production. The issue is only on dev.

zachmullen commented 3 years ago

If you tried it within the last 10 minutes, that's because I was testing a fix and it appears to be working :)

Brandon is now helping me get it deployed in a permanent fashion, should be done soon.

ssarrafan commented 3 years ago

Closing this as October release is in prod now

microbiomedata / nmdc-server

October Release Testing Feedback #537