lgatto / rpx

R Interface to the ProteomeXchange Repository
http://lgatto.github.io/rpx/
4 stars 2 forks source link

Wrong ftp URL in xml of data #5

Closed lgatto closed 1 year ago

lgatto commented 4 years ago

In the xml of a data, cvParam PRIDE:0000411 used to provide the ftp URL to the data. For example

> library(rpx)
> px <- PXDataset("PXD000001")
> pxurl(px)
 [1] "ftp://ftp.pride.ebi.ac.uk/2012/03/PXD000001"

which now throws and error in downstream processing. The data is now at at different URL:

ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2012/03/PXD000001/
ypriverol commented 4 years ago

This is what i see in the xml of that project:

<FullDatasetLinkList>
        <FullDatasetLink>
            <cvParam accession="PRIDE:0000411" cvRef="PRIDE" value="ftp://ftp.pride.ebi.ac.uk/2012/03/PXD000001" name="Dataset FTP location"/>
        </FullDatasetLink>
    </FullDatasetLinkList>

URL: http://central.proteomexchange.org/cgi/GetDataset?ID=PXD000001-9&outputMode=XML&test=no

lgatto commented 4 years ago

Yes, but now the data is at ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2012/03/PXD000001/

ypriverol commented 4 years ago

I see the issue. I will try to fix it tomorrow morning.

lgatto commented 4 years ago

Thanks.

lgatto commented 4 years ago

@ypriverol - could you update me here whenever there's news regarding this issue. The rpx package is failing due to this and I will need to fix this rather early than late.

ypriverol commented 4 years ago

I will update you when the issue is fixed. We have some other network issues at EBI we are trying to deal with. I will keep you posted.

lgatto commented 4 years ago

Any news, @ypriverol ?

lgatto commented 4 years ago

@ypriverol - any news on this? If this situation is to stay for any undetermined time, could you please let me know so that I can take other measures to sort out the resulting errors on my end.

ypriverol commented 4 years ago

We are still struggling to Sync our repo with ProtemeXchange. We are trying to solve the issues but I can not give you a date.

lgatto commented 4 years ago

Ok, thank you for the update.

ebrombacher commented 4 years ago

Hi,

In some cases, such as for PXD001584, the location path in the xml includes "pride/data/archive" :

<FullDatasetLink>
            <cvParam cvRef="PRIDE" accession="PRIDE:0000411" name="Dataset FTP location" value="ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/01/PXD001584"/>
</FullDatasetLink>

So after the update "pride/data/archive" is duplicated in the resulting url:

> px <- PXDataset("PXD001584")
> pxurl(px)
[1] "ftp://ftp.pride.ebi.ac.uk/pride/data/archive/pride/data/archive/2015/01/PXD001584"

Cheers

lgatto commented 4 years ago

Thank you @ebrombacher.

@ypriverol - any updates on this issue?

ebrombacher commented 4 years ago

Thank you! Are there any news on this, @lgatto @ypriverol?

ypriverol commented 4 years ago

@lgatto @ebrombacher We have fixed the problem inside pride but we are trying to push into ProteomeXchange the schema has been changed and we are now trying to work with them. This has priority in PRIDE but it will take a couple of more weeks to be fixed.

lgatto commented 4 years ago

Thank you @ypriverol - please keep us posted here an I'll update the package accordingly.

lgatto commented 4 years ago

@ebrombacher in the meantime, if you install the github version (1.23.1), you will be able to

> px <- PXDataset("PXD001584")
> pxurl(px)
[1] "ftp://ftp.pride.ebi.ac.uk/pride/data/archive/pride/data/archive/2015/01/PXD001584"
> rpx:::apply_fix_issue_5(FALSE)
> pxurl(px)
[1] "ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/01/PXD001584"
ebrombacher commented 4 years ago

@lgatto @ypriverol Great, thanks so much for your help!

lgatto commented 3 years ago

This should now be handled automatically by the software, that tests multiple possible URLs.

ebrombacher commented 3 years ago

@lgatto Perfect, thank you!

lgatto commented 3 years ago

@ebrombacher - note that you might still see that same error, as it can also be thrown when the resources is temporarily unavailable (which happens to happen lately).