openpipelines-bio / openpipeline

https://openpipelines.bio
MIT License
25 stars 11 forks source link

Add LSI #552

Open SarahOuologuem opened 8 months ago

SarahOuologuem commented 8 months ago

Changelog

Added LSI component

Issue ticket number and link

398

Checklist before requesting a review

DriesSchaumont commented 7 months ago

Hi @SarahOuologuem thanks for opening this PR and thanks @VladimirShitov for the review! I read through it and left some comments with thoughts on some of the conversations. Let me know if I can be of more help to keep this PR moving forward!

DriesSchaumont commented 7 months ago

Hi @SarahOuologuem I noticed that you implemented tests, which is really great! Currently, the test data was not uploaded into our test s3 bucket. Could you provide me with a link so that I can download the data (assuming it is public)? I will put it in our bucket. Otherwise, I think we could quickly connect on slack. Thanks :)

VladimirShitov commented 6 months ago

A general recommendation: it would be great to have more descriptive commit comments. For example "Change tabulation" or "Remove spaces" instead of "Small fixes". It would allow to quickly understand what happened without diving deeper in the code

rcannood commented 4 months ago

Hi Sarah!

Have you tried running viash test src/dimred/lsi/config.vsh.yaml?

I get:


=================================== FAILURES ===================================
______________________ test_select_highly_variable_column ______________________

tmp_path = PosixPath('/tmp/pytest-of-root/pytest-0/test_select_highly_variable_co0')

    def test_select_highly_variable_column(tmp_path):
        output_path = tmp_path / "output_lsi.h5mu"

        # run component
        cmd_args = [
        meta["executable"],
         "--input", str(input_path),
         "--output", str(output_path),
         "--var_input", "highly_variable"
        ]
>       subprocess.run(cmd_args, check=True)

tmp/viash-run-lsi-UhQpoQ.py:81: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, capture_output = False, timeout = None, check = True
popenargs = (['/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/lsi', '--input', '/viash_automount/tmp/viash_test_...5mu', '--output', '/tmp/pytest-of-root/pytest-0/test_select_highly_variable_co0/output_lsi.h5mu', '--var_input', ...],)
kwargs = {}
process = <Popen: returncode: 1 args: ['/viash_automount/tmp/viash_test_lsi35782840702...>
stdout = None, stderr = None, retcode = 1

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.

        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.

        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.

        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.

        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.

        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE

        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE

        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/lsi', '--input', '/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test//concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu', '--output', '/tmp/pytest-of-root/pytest-0/test_select_highly_variable_co0/output_lsi.h5mu', '--var_input', 'highly_variable']' returned non-zero exit status 1.

/usr/local/lib/python3.9/subprocess.py:528: CalledProcessError
----------------------------- Captured stdout call -----------------------------
2024-01-09 08:26:17,969 INFO     Reading /viash_automount/tmp/viash_test_lsi3578284070294657065/test_test//concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu.
2024-01-09 08:26:18,700 INFO     Using modality 'atac' and adata.X for LSI computation
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
  File "/tmp/viash-run-lsi-NUVvcM.py", line 93, in <module>
    adata_input_layer = subset_vars(adata_input_layer, par["var_input"])
  File "/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/subset_vars.py", line 17, in subset_vars
    raise ValueError(f"Requested to use .var column '{subset_col}' as a selection of genes, but the column is not available.")
ValueError: Requested to use .var column 'highly_variable' as a selection of genes, but the column is not available.
__________________________ test_selecting_input_layer __________________________

tmp_path = PosixPath('/tmp/pytest-of-root/pytest-0/test_selecting_input_layer0')

    def test_selecting_input_layer(tmp_path):
        output_path = tmp_path / "output_lsi.h5mu"

        # run component
        cmd_args = [
            meta["executable"],
            "--input", str(input_path),
            "--output", str(output_path),
            "--num_components", "20",
            "--layer", "counts"
            ]
>       subprocess.run(cmd_args, check=True)

tmp/viash-run-lsi-UhQpoQ.py:136: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, capture_output = False, timeout = None, check = True
popenargs = (['/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/lsi', '--input', '/viash_automount/tmp/viash_test_...mu', '--output', '/tmp/pytest-of-root/pytest-0/test_selecting_input_layer0/output_lsi.h5mu', '--num_components', ...],)
kwargs = {}
process = <Popen: returncode: 1 args: ['/viash_automount/tmp/viash_test_lsi35782840702...>
stdout = None, stderr = None, retcode = 1

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.

        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.

        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.

        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.

        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.

        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE

        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE

        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/lsi', '--input', '/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test//concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu', '--output', '/tmp/pytest-of-root/pytest-0/test_selecting_input_layer0/output_lsi.h5mu', '--num_components', '20', '--layer', 'counts']' returned non-zero exit status 1.

/usr/local/lib/python3.9/subprocess.py:528: CalledProcessError
----------------------------- Captured stdout call -----------------------------
2024-01-09 08:26:34,271 INFO     Reading /viash_automount/tmp/viash_test_lsi3578284070294657065/test_test//concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu.
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
  File "/tmp/viash-run-lsi-tOA0u2.py", line 80, in <module>
    raise ValueError(f"Layer '{par['layer']}' was not found in modality '{par['modality']}'.")
ValueError: Layer 'counts' was not found in modality 'atac'.
=============================== warnings summary ===============================
tmp/viash-run-lsi-UhQpoQ.py::test_lsi
tmp/viash-run-lsi-UhQpoQ.py::test_lsi
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_raises
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_raises
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_overwrite
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_overwrite
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_overwrite
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_overwrite
  /usr/local/lib/python3.9/site-packages/anndata/_core/anndata.py:453: PendingDeprecationWarning: The dtype argument will be deprecated in anndata 0.10.0
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tmp/viash-run-lsi-UhQpoQ.py::test_select_highly_variable_column - subp...
FAILED tmp/viash-run-lsi-UhQpoQ.py::test_selecting_input_layer - subprocess.C...
============= 2 failed, 6 passed, 8 warnings in 117.84s (0:01:57) ==============
====================================================================
ERROR! Only 0 out of 1 test scripts succeeded!
Unexpected error occurred! If you think this is a bug, please post
create an issue at https://github.com/viash-io/viash/issues containing
a reproducible example and the stack trace below.

Does the same error show up when you run it locally?

rcannood commented 3 months ago

Hi Sarah! Just checking in with this PR. When would you have some time to look at the issue I posted?

SarahOuologuem commented 3 months ago

sorry for the very late reply! yes, the errors make sense, haven't checked the new test data, i only ran the tests on my old test data. i'm currently drowning in work, especially because of exam season. please feel free to correct it yourself to speed up the process! so sorry! can't really say when i will have time to resolve the issue myself

VladimirShitov commented 3 months ago

I can take it over :) When I'll swim out of other work as well...