openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
725 stars 38 forks source link

[REVIEW]: fABBA: A Python library for the fast symbolic approximation of time series #6294

Closed editorialbot closed 8 months ago

editorialbot commented 10 months ago

Submitting author: !--author-handle-->@chenxinye<!--end-author-handle-- (Xinye Chen) Repository: https://github.com/nla-group/fABBA Branch with paper.md (empty if default branch): master Version: 1.2.1 Editor: !--editor-->@lrnv<!--end-editor-- Reviewers: @Karangupta1994, @allie-tatarian Archive: 10.5281/zenodo.10885652

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/bd028bbee6f04f9ec2ad0b1034997c61"><img src="https://joss.theoj.org/papers/bd028bbee6f04f9ec2ad0b1034997c61/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/bd028bbee6f04f9ec2ad0b1034997c61/status.svg)](https://joss.theoj.org/papers/bd028bbee6f04f9ec2ad0b1034997c61)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@Karangupta1994 & @allie-tatarian, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @lrnv know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

πŸ“ Checklist for @allie-tatarian

πŸ“ Checklist for @Karangupta1994

editorialbot commented 10 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 10 months ago
Software report:

github.com/AlDanial/cloc v 1.88  T=1.29 s (75.8 files/s, 270557.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                               12          16359          74932         195168
Python                          29           1657           2655           3294
Jupyter Notebook                18              0          50227           2514
Cython                          12            337            494            561
Markdown                         7            138              0            349
reStructuredText                 9            244            236            152
TeX                              2             14              0            128
YAML                             4             40             24             83
CSS                              1             18             15             57
DOS Batch                        1              8              1             26
Bourne Shell                     1              2              0             18
make                             1              4              7              9
TOML                             1              0              0              3
-------------------------------------------------------------------------------
SUM:                            98          18821         128591         202362
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 10 months ago

Wordcount for paper.md is 875

editorialbot commented 10 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- None

MISSING DOIs

- 10.1007/s10618-020-00689-6 may be a valid DOI for title: ABBA: adaptive Brownian bridge-based symbolic aggregation of time series
- 10.1109/tit.1982.1056489 may be a valid DOI for title: Least squares quantization in PCM
- 10.1145/882082.882086 may be a valid DOI for title: A symbolic representation of time series, with implications for streaming algorithms
- 10.1145/3532622 may be a valid DOI for title: An efficient aggregation method for the symbolic representation of temporal data
- 10.1016/j.is.2023.102294 may be a valid DOI for title: ECG classification with learning ensemble based on symbolic discretization
- 10.1007/978-3-031-24378-3_4 may be a valid DOI for title: Fast Time Series Classification with Random Symbolic Subsequences
- 10.1007/s10618-007-0064-z may be a valid DOI for title: Experiencing SAX: a novel symbolic representation of time series
- 10.1109/seed55351.2022.00016 may be a valid DOI for title: Foreseer: Efficiently Forecasting Malware Event Series with Long Short-Term Memory
- 10.1016/j.ress.2023.109123 may be a valid DOI for title: Data-driven prognostics based on time-frequency analysis and symbolic recurrent neural network for fuel cells under dynamic load
- 10.1145/3448672 may be a valid DOI for title: A Framework for Generating Summaries from Temporal Personal Health Data

INVALID DOIs

- None
editorialbot commented 10 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

lrnv commented 10 months ago

πŸ‘‹πŸΌ @chenxinye @Karangupta1994 @allie-tatarian this is the review thread for the paper. All of our communications will happen here from now on.

As a reviewer, the first step is to create a checklist for your review by entering

@editorialbot generate my checklist

as the top of a new comment in this thread.

These checklists contain the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#REVIEW_NUMBER so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@lrnv) if you have any questions/concerns.

chenxinye commented 10 months ago

@lrnv Thanks for the arrangement!

allie-tatarian commented 10 months ago

Hi all, I'm not sure if this is the best place to this but I don't think it makes sense to post this as an issue because I think it's a problem on my end. When I try to use fABBA, I am consistently getting a "This installation is not using Cython" warning even though I do have Cython installed. @chenxinye is this something you have seen before? I cannot for the life of me figure out how to troubleshoot this. The fABBA library is still mostly working for me, but sometimes I get different outputs than the examples.

lrnv commented 10 months ago

@allie-tatarian Different output than the examples surely looks like an issue (unless the examples are explicitely random ?). Problem in the installation can also be reported as issues.

@Karangupta1994 & @allie-tatarian, to guide you in the review, there is an automated checklist that you have to produce by commenting in this thread :

@editorialbot generate my checklist

I think the standard checkpionts includes stuff such as the need for descriptive installation guidelines if some requirements are "non-standard".

More generally, this is a very good place to discuss this thread is here for that purpose

allie-tatarian commented 10 months ago

Thanks @lrnv for the input, I opened an issue on the project page, but I think I am going to try using fABBA again on a different computer just in case it is an issue with the security settings on this one.

chenxinye commented 10 months ago

Hi @lrnv, I just sorted the doi in bib, many thanks!

chenxinye commented 10 months ago

@allie-tatarian

I just checked, found some output in README.md is consistent with the real output of software. I now corrected it! Please check.

chenxinye commented 10 months ago

Thanks for pointing the issues.

For the Cython issue, @lrnv @allie-tatarian, some users reported before, most of them have this issue since their machine have firewall or without VC++ installed. Could you please provide your machine specifications for me?

I checked my three machines, all worked properly. I guess you use Windows without VC++ installed?

allie-tatarian commented 10 months ago

Thanks @chenxinye, I do have VC++ installed, so I think the issue is with the firewall on my work computer. I will try everything again on my personal computer tomorrow and report back!

chenxinye commented 10 months ago

You would need to install fABBA using:

pip install fABBA

allie-tatarian commented 10 months ago

Thanks @chenxinye, like I mentioned in the issue I did install using pip. It is working fine on my personal computer, so I think the issue is that the security on my work computer is not meshing well with cython. I wonder if it's worth putting a note about that in the docs for fABBA - I do think the issue is with cython, not fABBA, but looking around online it looks like I'm not the only one who has run into this issue and I haven't seen anyone spell out the solution as concisely as you did here!

chenxinye commented 10 months ago

Hi @allie-tatarian Thanks for your suggestions! That's really a good idea, I will put a note there!

chenxinye commented 10 months ago

@allie-tatarian Just added in https://fabba.readthedocs.io/en/latest/quickstart.html ! Many thanks

allie-tatarian commented 9 months ago

Review checklist for @allie-tatarian

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

allie-tatarian commented 9 months ago

I did run into some issues when I was trying out the JABBA package, trying to follow the examples both in the README and the docs, but then I realized that JABBA isn't in the paper so I did the peer review following the functionality that is listed in the paper. @lrnv can you let me know if you think that is off-base?

lrnv commented 9 months ago

@allie-tatarian I'm confused, what is the difference between fABBA (the software we review here) and JABBA ? Functionalities of the reviewed software need to be checked up, even if they are not mentionned in the paper itself. e.g., you can check out the documentation, see if it works correctly, etc..

if you mean this one here : https://fabba.readthedocs.io/en/latest/api_reference.html#jabba it seems to be integral part of the submitted work.

@chenxinye is JABBA part of the submitted work here ?

chenxinye commented 9 months ago

Sorry for the confusion @lrnv @allie-tatarian

JABBA is an extension of ABBA/fABBA towards multiple or multivariate time series, it also works for univariate time series (enabling parallel computing!).

@allie-tatarian Which issue did you come across, did you follow this: https://fabba.readthedocs.io/en/latest/multivariate.html?

chenxinye commented 9 months ago

The method JABBA also works for Before running the example, you should download the UEA data: https://timeseriesclassification.com/

Or you can simply generate multivariate time series yourself via: np.random.randn(50, 30)

lrnv commented 9 months ago

@chenxinye Maybe one interesting improvements to your docs would be a synthesis of these discussions.

allie-tatarian commented 9 months ago

Thank you both for the clarification! Yes, @chenxinye, that is the code I was following. I downloaded the BasicMotions dataset, copied the code, and the only thing I changed was changing the directory in the first line to an absolute directory pointing to the dataset. When I try to run it, I get the error: AttributeError: module 'os' has no attribute 'sched_getaffinity' which gets called at line 28, when the code is trying to call fit_transform(). When I looked up the error, it said that sched_getaffinity only works on Linux machines, not Windows. The full error message is:

Traceback (most recent call last):
  File "c:\Users\willi\Documents\program_practice\fabba\jabba_docs.py", line 28, in <module>
    symbols_series = jabba1.fit_transform(mts)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\willi\AppData\Local\Programs\Python\Python312\Lib\site-packages\fABBA\jabba\jabba.py", line 262, in fit_transform
    self.fit(series, n_jobs=n_jobs, alphabet_set=alphabet_set)
  File "C:\Users\willi\AppData\Local\Programs\Python\Python312\Lib\site-packages\fABBA\jabba\jabba.py", line 289, in fit
    self.pieces = self.parallel_compress(series, n_jobs=n_jobs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\willi\AppData\Local\Programs\Python\Python312\Lib\site-packages\fABBA\jabba\jabba.py", line 317, in parallel_compress
    n_jobs = self.n_jobs_init(n_jobs, _max=len_ts)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\willi\AppData\Local\Programs\Python\Python312\Lib\site-packages\fABBA\jabba\jabba.py", line 655, in n_jobs_init
    n_jobs = len(os.sched_getaffinity(0))
                 ^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'os' has no attribute 'sched_getaffinity'
allie-tatarian commented 9 months ago

A separate but related issue I ran into when trying to run the code that is in the README in the JABBA section is that that code includes the lines:

from fABBA import loadData
train, test = loadData()

I can't find any documentation, either in the README or the documentation, about how loadData() works. The image compression section of the README has a similar issue - it introduces load_images(), but I can't find anything about how to use load_images in the README or the docs.

chenxinye commented 9 months ago

Hi @allie-tatarian @lrnv , thanks for mentioning that, I just add documentation as well as docstring for the loadData().

The updated document can be found in https://fabba.readthedocs.io/en/latest/api_reference.html.

Also I fix the JABBA issue in Windows, many thanks for raising this issue!

I will also continue to work on the improvements of the documentation.

allie-tatarian commented 9 months ago

@chenxinye Thanks for the quick update! I tried the code in the documentation and it worked perfectly. I also tried the code with loadData() from the README, and it works when I use "Beef" but not "BasicMotions." I checked the directory that the data should have been in, and the "Beef" data was there but not the BasicMotions folder. I added the BasicMotions folder I had downloaded, and I got this error:

  File "c:\Users\willi\Documents\program_practice\fabba\jabba_readme.py", line 5, in <module>
    train, test = loadData(name="BasicMotions")
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\willi\AppData\Local\Programs\Python\Python312\Lib\site-packages\fABBA\load_datasets.py", line 47, in loadData
    return train, test
                  ^^^^
UnboundLocalError: cannot access local variable 'test' where it is not associated with a value

Also I can see now the documentation for load_images() too, thanks for clarifying how it works! I hadn't tried that code yet, that will be the next thing I do.

allie-tatarian commented 9 months ago

Quick update: The image compression/decompression code worked great, so the only issue I have left is when using loadData(name="BasicMotions").

chenxinye commented 9 months ago

Hi @allie-tatarian All issues should be fixed in version 1.2.1 now. Thank you so much!

allie-tatarian commented 9 months ago

Thanks @chenxinye, now when I try to use loadData(name="BasicMotions"), I get this error (I do NOT get an error when I try to run the code using loadData(name="Beef")...again, this is the JABBA example code in the README):

  File "c:\Users\willi\Documents\program_practice\fabba\jabba_readme.py", line 8, in <module>
    symbols = jabba.fit_transform(train)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\willi\AppData\Local\Programs\Python\Python312\Lib\site-packages\fABBA\jabba\jabba.py", line 267, in fit_transform
    self.fit(series, n_jobs=n_jobs, alphabet_set=alphabet_set)
  File "C:\Users\willi\AppData\Local\Programs\Python\Python312\Lib\site-packages\fABBA\jabba\jabba.py", line 294, in fit
    self.pieces = self.parallel_compress(series, n_jobs=n_jobs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\willi\AppData\Local\Programs\Python\Python312\Lib\site-packages\fABBA\jabba\jabba.py", line 367, in parallel_compress
    pieces = [p.get() for p in pieces]
              ^^^^^^^
  File "C:\Users\willi\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\pool.py", line 774, in get
    raise self._value
  File "C:\Users\willi\AppData\Local\Programs\Python\Python312\Lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "fABBA\\jabba\\compmem.pyx", line 10, in fABBA.jabba.compmem.compress
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
chenxinye commented 9 months ago

Hi @allie-tatarian, Thank you, I missed that. JABBA is used to process the 2-dimensional time series, the 3-dimensional one have not yet developed.

So the proper use is

from fABBA import JABBA
from fABBA import loadData
train, test = loadData(name="BasicMotions")
jabba = JABBA(tol=0.0005, init='agg', verbose=1)
symbols = jabba.fit_transform(train[0]) 
reconst = jabba.inverse_transform(symbols)

I have added note in the ([JABBA example code in the README)

chenxinye commented 9 months ago

related document also were updated. Many thanks!

allie-tatarian commented 9 months ago

Ok, great, thanks for the explanation @chenxinye! I also noticed that for the last two lines:

test_symbols, start_set = jabba.transform(test)
test_reconst = jabba.inverse_transform(test_symbols, start_set)

I also had to change jabba.transform(test) to jabba.transform(test[0]) to get it to work using BasicMotions.

Thanks for all of your help and quick updates!

chenxinye commented 9 months ago

@allie-tatarian

Exactly, thanks for mentioning it, I already added note in the code example for the transform() method.

Ok, great, thanks for the explanation @chenxinye! I also noticed that for the last two lines:

test_symbols, start_set = jabba.transform(test)
test_reconst = jabba.inverse_transform(test_symbols, start_set)

I also had to change jabba.transform(test) to jabba.transform(test[0]) to get it to work using BasicMotions.

Thanks for all of your help and quick updates!

lrnv commented 9 months ago

@Karangupta1994 it has been two weeks already, did you manage to find some time to start this review ? If you need a bit more time, I could setup automatic reminders for you if you want.

chenxinye commented 9 months ago

@editorialbot check references

editorialbot commented 9 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1007/s10618-020-00689-6 is OK
- 10.1109/tit.1982.1056489 is OK
- 10.1145/882082.882086 is OK
- 10.1145/3532622 is OK
- 10.1016/j.is.2023.102294 is OK
- 10.1007/978-3-031-24378-3_4 is OK
- 10.1007/s10618-007-0064-z is OK
- 10.1109/seed55351.2022.00016 is OK
- 10.48550/arXiv.2003.05672 is OK
- 10.1016/j.ress.2023.109123 is OK
- 10.1145/3448672 is OK

MISSING DOIs

- None

INVALID DOIs

- 10.5555/1953048.2078195 is INVALID
chenxinye commented 9 months ago

Hi @lrnv, looks like the other reviewer is still not involved since it has been a month. I guess his GitHub account is frozen?

lrnv commented 9 months ago

@chenxinye No I had an e-mail exchange with @Karangupta1994 and It looks like he's still on board. Why are you saying his accoutn is frozen ?

chenxinye commented 9 months ago

I just checked my DOI 10.5555/1953048.2078195 is correct, not sure why it reminds me invalid DOI.

lrnv commented 9 months ago

https://doi.org/10.5555/1953048.2078195

chenxinye commented 9 months ago

@chenxinye No I had an e-mail exchange with @Karangupta1994 and It looks like he's still on board. Why are you saying his accoutn is frozen ?

Sorry, I apologized the mistake! I only saw the account activity.

chenxinye commented 9 months ago

Hi @lrnv it seems like https://www.jmlr.org/papers/v12/pedregosa11a.html does not provide DOI, how do I deal with it? Many thanks!

lrnv commented 9 months ago

I think you just have to curate your bib file and remove the DOi field. At the link you provided, they give the following bib :

@article{JMLR:v12:pedregosa11a,
  author  = {Fabian Pedregosa and Ga{{\"e}}l Varoquaux and Alexandre Gramfort and Vincent Michel and Bertrand Thirion and Olivier Grisel and Mathieu Blondel and Peter Prettenhofer and Ron Weiss and Vincent Dubourg and Jake Vanderplas and Alexandre Passos and David Cournapeau and Matthieu Brucher and Matthieu Perrot and {{\'E}}douard Duchesnay},
  title   = {Scikit-learn: Machine Learning in Python},
  journal = {Journal of Machine Learning Research},
  year    = {2011},
  volume  = {12},
  number  = {85},
  pages   = {2825--2830},
  url     = {http://jmlr.org/papers/v12/pedregosa11a.html}
}

without a doi field.

chenxinye commented 9 months ago

@editorialbot check references

editorialbot commented 9 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1007/s10618-020-00689-6 is OK
- 10.1109/tit.1982.1056489 is OK
- 10.1145/882082.882086 is OK
- 10.1145/3532622 is OK
- 10.1016/j.is.2023.102294 is OK
- 10.1007/978-3-031-24378-3_4 is OK
- 10.1007/s10618-007-0064-z is OK
- 10.1109/seed55351.2022.00016 is OK
- 10.48550/arXiv.2003.05672 is OK
- 10.1016/j.ress.2023.109123 is OK
- 10.1145/3448672 is OK

MISSING DOIs

- None

INVALID DOIs

- None
chenxinye commented 9 months ago

I think you just have to curate your bib file and remove the DOi field. At the link you provided, they give the following bib :

@article{JMLR:v12:pedregosa11a,
  author  = {Fabian Pedregosa and Ga{{\"e}}l Varoquaux and Alexandre Gramfort and Vincent Michel and Bertrand Thirion and Olivier Grisel and Mathieu Blondel and Peter Prettenhofer and Ron Weiss and Vincent Dubourg and Jake Vanderplas and Alexandre Passos and David Cournapeau and Matthieu Brucher and Matthieu Perrot and {{\'E}}douard Duchesnay},
  title   = {Scikit-learn: Machine Learning in Python},
  journal = {Journal of Machine Learning Research},
  year    = {2011},
  volume  = {12},
  number  = {85},
  pages   = {2825--2830},
  url     = {http://jmlr.org/papers/v12/pedregosa11a.html}
}

without a doi field.

Many thanks, now sorted!

Karangupta1994 commented 9 months ago

Review checklist for @Karangupta1994

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

lrnv commented 8 months ago

Hey @Karangupta1994 how is this going ? Would you update us on review status ?