nih-sparc / sparc.client

Python client for NIH SPARC
https://docs.sparc.science/docs/sparc-python-client
Apache License 2.0
0 stars 8 forks source link

Download multiple files from Pennsieve #12

Closed Kayvv closed 1 year ago

Kayvv commented 1 year ago

Download file function in pennsieve.py seems to take a list of dictionaries as input, but it shows and error when I try to input a list with more than one dictionary. Also can add test case for downloading multiple files in test_pennsieve.py.

nickerso commented 1 year ago

@athril @jwagenaar - just wondering if you have any thoughts on this one?

athril commented 1 year ago

Hi @Kayvv, can you provide a reproducible example of your error?

Kayvv commented 1 year ago

Hi @athril, Here's an example of this issue:

def test_download_files():
    file_list = [
        {'name': '5HT_cellDensity_5.json',
         'datasetId': 292,
         'datasetVersion': 1,
         'size': 534547,
         'fileType': 'Json',
         'packageType': 'Unsupported',
         'icon': 'JSON',
         'uri': 's3://prd-sparc-discover-use1/292/1/files/derivative/5-HT/5HT_cellDensity_5.json',
         'createdAt': None,
         'sourcePackageId': 'N:package:b3df91b7-bb85-4647-a106-02208ea07945'},
        {'name': '5HT_cellDensity_Layout1_view.json',
         'datasetId': 292,
         'datasetVersion': 1,
         'size': 316,
         'fileType': 'Json',
         'packageType': 'Unsupported',
         'icon': 'JSON',
         'uri': 's3://prd-sparc-discover-use1/292/1/files/derivative/5-HT/5HT_cellDensity_Layout1_view.json',
         'createdAt': None,
         'sourcePackageId': 'N:package:c0028722-74fc-4ec5-b5c6-5c3335f9d49a'}
    ]
    p = PennsieveService(connect=False)
    p.download_file(file_list=file_list)

When I run this test I get a TypeError:

E TypeError: expected str, bytes or os.PathLike object, not dict

However, it works fine if I run them individually like this:

def test_download_files():
    file_list = [
        {'name': '5HT_cellDensity_5.json',
         'datasetId': 292,
         'datasetVersion': 1,
         'size': 534547,
         'fileType': 'Json',
         'packageType': 'Unsupported',
         'icon': 'JSON',
         'uri': 's3://prd-sparc-discover-use1/292/1/files/derivative/5-HT/5HT_cellDensity_5.json',
         'createdAt': None,
         'sourcePackageId': 'N:package:b3df91b7-bb85-4647-a106-02208ea07945'}
    ]
    p = PennsieveService(connect=False)
    p.download_file(file_list=file_list)

    file_list = [
        {'name': '5HT_cellDensity_Layout1_view.json',
         'datasetId': 292,
         'datasetVersion': 1,
         'size': 316,
         'fileType': 'Json',
         'packageType': 'Unsupported',
         'icon': 'JSON',
         'uri': 's3://prd-sparc-discover-use1/292/1/files/derivative/5-HT/5HT_cellDensity_Layout1_view.json',
         'createdAt': None,
         'sourcePackageId': 'N:package:c0028722-74fc-4ec5-b5c6-5c3335f9d49a'}
    ]
    p.download_file(file_list=file_list)

PS: I got these two file dictionary by running p.list_files(limit=2, dataset_id=292)

While I was writing the documentation for zinchelper, I found the same error also appearing in the tutorial.ipynb here in the 12th code block:

https://github.com/nih-sparc/sparc.client/blob/main/docs/tutorial.ipynb

athril commented 1 year ago

Thank you, @Kayvv, I was able to reproduce your error. Basically the error you're getting is caused by not providing output_name parameter for the _downloadfile() function. Anyway, this function should be secured more to prevent similar errors in the future. I will submit a patch next week, possibly also a test to cover this use case.

athril commented 1 year ago

Closed with PR #16