phac-nml / irida

Canada’s Integrated Rapid Infectious Disease Analysis Platform for Genomic Epidemiology
https://irida.ca
Apache License 2.0
40 stars 31 forks source link

Fix file processor running on files that are on unfinished upload runs. #1506

Closed JeffreyThiessen closed 11 months ago

JeffreyThiessen commented 11 months ago

Description of changes

Added a filtering step to SequencingObjectProcessingService such that files that are on a sequencing run that is not in a COMPLETE state are not picked up for processing.

Related issue

Link to the GitHub issue this pull request addresses using the #issuenum format. If it completes an issue, use Fixes #issuenum to automatically close the issue. Fixes #1505

Fixes the race condition issue of files having FastQC run on them before they are fully uploaded.

How to test changes

  1. run irida
  2. In IRIDA, create a new project and a new sample in that project. Make note of the Project_ID and Sample Name
  3. checkout, build and import the iridauploader codebase into a python3 interpreter
    cd irida-uploader
    git pull origin development
    make
    source .virtualenv/bin/activate
    python3
  4. Use the libraries to make a sequencing run, and upload a file
    
    import iridauploader
    from iridauploader import api
    # make an api instance of IRIDA
    # if you built irida with dev db seed, the following creds should work
    a = api.ApiCalls("sequencer", "N9Ywc6GKWWZotzsJGutj3BZXJDRn65fXJqjrk29yTjI", "http://localhost:8080/api/", "admin","Password1!")
    # test connection
    a.get_irida_version()

make a sequencing file

a valid file which will pass FastQC can be found in the irida-uploader source. irida-uploader/examples/directory_run/file_1.fastq.gz

sf = iridauploader.model.SequenceFile(['/path/to/a/fastq.gz/file/mysample.fastq.gz'])

create a new sequencing run in IRIDA

run_id = a.create_seq_run(metadata={'layoutType': 'SINGLE_END'}, sequencing_run_type='miseq')

Go to the IRIDA ADMIN panel to view your sequencing run http://localhost:8080/admin/sequencing-runs

Use your project id and sample name from before

p_id = 1 # Note: this should be an int s_name = 'my_sample' # Note: this should be a string

upload the data

a.send_sequence_files(sf, s_name, p_id, run_id)

response should look something like this

{'resource': {'file': '/tmp/irida/sequence-files/45/1/valid.fastq.gz', 'createdDate': 1702507892000, 'modifiedDate': 1702507892000, 'uploadSha256': None, 'fileName': 'valid.fastq.gz', 'label': 'valid.fastq.gz', 'fileSizeBytes': 864, 'links': [{'rel': 'sample/sequenceFiles', 'href': 'http://localhost:8080/api/samples/140/sequenceFiles'}, {'rel': 'self', 'href': 'http://localhost:8080/api/samples/140/unpaired/24/files/45'}, {'rel': 'sample', 'href': 'http://localhost:8080/api/samples/140'}, {'rel': 'sequenceFile/sequencingObject', 'href': 'http://localhost:8080/api/samples/140/unpaired/24'}], 'identifier': '45'}}

5. On IRIDA, see that the sequencing run is still in UPLOADING state
6. On IRIDA, wait a few minutes and see that the file has not been processed by FastQC
7. Via the python interpreter, set the sequencing run to COMPLETE
```python
a.set_seq_run_complete(run_id)
  1. On IRIDA, see that the sequencing run is in COMPLETE state
  2. On IRIDA, assuming your dev environment is set up correctly, see that FastQC has run on the sample. (Can be seen in the dev output logs too)
  3. Upload a file to a sample via the Web GUI. See that FastQC runs on it, as there is no associated Sequencing Run

Checklist

Things for the developer to confirm they've done before the PR should be accepted: