kids-first / kf-portal-ui

:bar_chart: The Kids First Data Resource Portal and Social Network User Interface
Apache License 2.0
25 stars 15 forks source link

Can't push CBTTC files to Cavatica #1725

Closed baileyckelly closed 4 years ago

baileyckelly commented 5 years ago

Study: Pediatric Brain Tumors - CBTTC

It looks like you can't push any of this study's files to Cavatica even though the user has access.

I have access to other dbgap studies (congenital heart defects etc) that I can push to Cavatica just fine. The portal says I have access to the CBTTC files & I can download the files directly. Its just the push to Cavatica that is having issues.

Steps to reproduce: Assumptions: Your accounts are connected to eRA Commons & Cavatica

  1. Select any file(s) from CBTTC.
  2. Click the analyze in Cavatica button.
  3. Select a project that you want to push them to & initiate the transfer.
  4. The portal says success. However, when you navigate to the project in Cavatica the files are not there.

Possible Causes (speculation):

  1. Issue with the portal & the transfer of this study. (Note this is our only non-dbgap study ... it does not have a phs # but that hasn't been an issue previously.
  2. Issue on the Cavatica side.
  3. Issue on the data side (We have not updated anything in the data service recently so this is unlikely...)
baileyckelly commented 5 years ago

@aalex Let me know if you need any other details on this guy.

aalex commented 5 years ago

We must investigate to see if it's not on SBG's side, first.

baileyckelly commented 5 years ago

Also looping in @mattions

aalex commented 5 years ago

We have reproduced the bug and it's confirmed. Cavatica answers with OK, but the files don't appear there. We have tried with different file formats for another study, and they all work.

123.png

Portal_-_User_Dashboard.png

baileyckelly commented 5 years ago

After some more poking around....I'm wondering if there is something going on with the sync to cavatica.

We ran a release & published for CBTTC on 5/2/19. It was a very very small change - We updated the outcome data for one participant. No changes were made to files. When I ran that release, all task services were on including the SBG sync.

Just to cover our bases - I ran another release for CBTTC this afternoon. That seems to of solved most of the problems - which doesnt make a lot of sense since we have not touched any of the files. It's not 100% fixed though because there are some files that are still not copying over.

I went through and tested each file type for CBTTC. The below is the number of files that actually copied over vs the total the portal is reporting:

Radiology Reports: 585/594 Radiology Images: 604/615 Unaligned Reads: 836/836 gvcf: 52/884 pathology reports: 1006/1015 operation reports: 1056/1066 histology images: 3577/3620 aligned reads: 3786/5514

mattions commented 5 years ago

hi there,

It seems that you are able to push these files, the API says it's ok, but they do not show up in the project. I've managed to replicate the problem.

Let me pull @trbica that was looking specifically at this and see if he figured out something.

@baileyckelly my understanding is that the sync works up to a certain point, but we miss some specific files. Is that what you see? Do you have the url of the latest release run?

mattions commented 5 years ago

I've just randomly pushed the first three files form the CBBTC:

Screenshot from 2019-05-14 11-51-57

and they do appear on Cavatica:

Screenshot from 2019-05-14 11-52-07

@baileyckelly and @aalex would be possible to have a list or (some of the files) that do not copy over, so we can better pinpoint why these special files are not copied over?

Is there any particular property that you can think of these files why they are not being properly synced?

mattions commented 5 years ago

one piece of information more:

we have migrated our service from a previous CI to a new CI, and @trbica is working on fixing some issue that have arisen there. That's a blocker to see what happened to the syncing service. We do not know if there are bugs there yet, but first we solve that, and then we can take a look at the other.

Still pinpointing to which files did not get synced could help identify why not all get synced...

baileyckelly commented 5 years ago

I’ll pull a list as soon as I get in the office but it’s the majority of the gvcfs as I was only able to copy over 52 of the 800+ And a good chunk of The bams. I’m not super convinced it’s a specific file type. The files that I tried before I redid the sync yesterday afternoon worked after the sync.

baileyckelly commented 5 years ago

Here is a manifest of the gVCFs that are not copying over (note - there are more files than just these that are not copying over - this was just the easiest list to pull because of the sheer volume).

I created this by:

  1. Selecting CBTTC as the study in the file repo
  2. Selecting "gvcfs" from the data type in the file filters
  3. Copied all 884 to a new cavatica project
  4. Went to the cavatica project - saw only 52/884 files. Downloaded the cavatica manifest.
  5. Downloaded a file manifest of the gvcfs from the file repo.
  6. Brought the BS id from the cavatica manifest over to the file repo manifest and did a vlookup.

The attached file is the resultant spreadsheet with the vlookup in column K. I currently have it filted on the #N/A's which are those that are in the file repo but did not make it to Cavatica. If you remove the filter, you'll see the ones that did copy over. Sheet has the GF ID in it. Let me know if you need me to pull in the latest_did from the database.

missing_in_cavatica.xlsx

mattions commented 5 years ago

Yeah, I get the same. Ok thanks. This is being currently being investigated.

trbica commented 5 years ago

I have fixed one bug that was preventing us to read all the data from dataservice! Also, I found some bugs on dataservice, total field is showing number of results that are not valid, ie. only one result for listing resource and total shows 590 or something like that. Now if we encounter that kind of bug during import, the task will fail and coordinator will be informed! I hope next release will be successful! Cheers

surendhar-bioinformatics commented 4 years ago

Hi, I encountered the same problem today. I Can't push CBTTC files to Cavatica. Is there any solution for this problem