icgc-dcc / dcc-portal

Data portal for exploring and accessing data
https://dcc.icgc.org/
Other
15 stars 8 forks source link

SSM data download becomes unavailable when logged in #589

Open hknahal opened 5 years ago

hknahal commented 5 years ago

The open-access portion of simple somatic mutation data download becomes unavailable when a user is logged into the Data Portal. This issue was brought to attention when a user recently contacted us (see ticket at https://extsd.oicr.on.ca/projects/ICGCSD/queues/custom/11/ICGCSD-2518) about not being able to download simple somatic mutation data at https://dcc.icgc.org/donors/DO232224 when they were logged into the Data Portal. I can confirm their OpenID is correct and they do have access to controlled data. When I tried to replicate his issue by going to https://dcc.icgc.org/donors/DO232224 while not logged in, clicking on "Download Donor Data", I was able to see "Simple Somatic Mutation" in the list:

not_logged_in

But if I am logged into the Data Portal, the "Simple Somatic Mutation" disappears from the list:

logged_in

This donor does have (open-access) SSM data, so it should be available to download whether the user is logged in or not.

hknahal commented 5 years ago

@andricDu Just some more details. It looks like this bug is only affecting the PBCA-US project. When I'm not logged into the Portal and go to https://dcc.icgc.org/releases/current/Projects/PBCA-US, I can see the "simple_somatic_mutation.open.PBCA-US.tsv.gz" file:

not-logged-in

But if I log in, the simple_somatic_mutation file disappears:

logged-in

Expected behaviour:

The controlled version of the SSM file (ie. "simple_somatic_mutation.controlled.PBCA-US.tsv.gz") should always exist if a user is logged into the Portal. By default:

So even if a project does not have any masked SSMs, the "simple_somatic_mutation.controlled.[ICGC-PROJECT].tsv.gz" file will still appear when the user is logged in and it will basically be a copy of the open version.

I know we apply different masking rules to US projects (TARGET and TCGA). Since PBCA-US is the only US project that is not TARGET or TCGA, is it possible that we treated this project differently during the SSM masking step, and somehow the controlled version of SSM file is not being made available?

rosibaj commented 5 years ago

After a discussion with Dusan, this is acceptable. Those files were deleted manually from the database.

When you are logged in, if you have daco access it gives you the UNMASKED donors. This does not exist for the kidsfirst donors.

This is not a BUG - We need to investigate how to show the open access mutations by connecting to the production HDFS.

rosibaj commented 5 years ago

Similar issue reported by Lincoln, in the releases section:

Here's some bad behaviour in the data releases page, regarding when controlled and open tier data are displayed:

  1. Before logging in to the ICGC portal, go to https://dcc.icgc.org/releases/PCAWG/consensus_snv_indel
  2. View the list of files. It shows one open tier file (screen shot 1)
  3. Log in to the portal, then reload the page. Now it just shows the controlled tier files (screenshot 2) The result is that when I log in, I can see the controlled tier files but not the open tier ones!!

Question: Can we adjust this behavior so that:

Analysis

We cannot show the open and controlled files together in the same directory; this would require a different design of the system.

Some options we can explore are:

  1. removing the .open from some files, and they could show up when the user logs in
  2. image
  3. Add more description to the 403 forbidden - needs to be checked across portal. { "code": 403, "message": "Forbidden - Please login or check permissions to access this resource." }
christinayung commented 4 years ago
  1. There's only 1 file in the PCAWG directory that needs to be renamed. Please remove .open from PCAWG/consensus_snv_indel/final_consensus_snv_indel_passonly_icgc.open.tgz

  2. While adding verbiage to Having trouble downloading?, please update the link to http://docs.icgc.org/download/repositories so users don't get that annoying redirect message.

rosibaj commented 4 years ago

@christinayung tracking of tasks mentioned here: https://github.com/icgc-dcc/dcc-portal/issues/647