salesforce / Merlion

Merlion: A Machine Learning Framework for Time Series Intelligence
BSD 3-Clause "New" or "Revised" License
3.35k stars 293 forks source link

[FEATURE REQUEST] docker and colab example #21

Open mosheliv opened 2 years ago

mosheliv commented 2 years ago

Is your feature request related to a problem? Please describe. Its difficult to try merlion because of requirements collision. A docker or maintained colab notebook would solve this nicely. Currently merlion installation in colab fails because of pandas/statsmodels collision.

Describe the solution you'd like A docker or colab notebook that just work.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

paulkass commented 2 years ago

Could you elaborate on the colab pandas/statsmodels collisions that you're seeing? If I open a random colab notebook right now and try to play around with Merlion, am I bound to see these issues?

mosheliv commented 2 years ago

apologies, i should have included a notebook. here is one in a gist, hopefully it opens well https://gist.github.com/mosheliv/766e087ef327a38fd2c2ab33c430c4d1/raw/2b0a512a9d893d6b52d3eedf024500670db8c567/0_forecastintro.ipynb

On Tue, Oct 5, 2021, 14:42 Paul Kassianik @.***> wrote:

Could you elaborate on the colab pandas/statsmodels collisions that you're seeing? If I open a random colab notebook right now and try to play around with Merlion, am I bound to see these issues?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/salesforce/Merlion/issues/21#issuecomment-933990081, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC7IWC3F6YPZDLGLHSOX4K3UFJJZZANCNFSM5FKJTHJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mosheliv commented 2 years ago

just a few more comments... I also tried it with my own dataset, no go, it fails later. At least one easily runnable example will be much appreciated.

On Tue, 5 Oct 2021 at 15:02, Moshe Livne @.***> wrote:

apologies, i should have included a notebook. here is one in a gist, hopefully it opens well

https://gist.github.com/mosheliv/766e087ef327a38fd2c2ab33c430c4d1/raw/2b0a512a9d893d6b52d3eedf024500670db8c567/0_forecastintro.ipynb

On Tue, Oct 5, 2021, 14:42 Paul Kassianik @.***> wrote:

Could you elaborate on the colab pandas/statsmodels collisions that you're seeing? If I open a random colab notebook right now and try to play around with Merlion, am I bound to see these issues?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/salesforce/Merlion/issues/21#issuecomment-933990081, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC7IWC3F6YPZDLGLHSOX4K3UFJJZZANCNFSM5FKJTHJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mosheliv commented 2 years ago

And the other problems were mostly with the nbeats notebook

On Tue, Oct 5, 2021, 16:44 Moshe Livne @.***> wrote:

just a few more comments... I also tried it with my own dataset, no go, it fails later. At least one easily runnable example will be much appreciated.

On Tue, 5 Oct 2021 at 15:02, Moshe Livne @.***> wrote:

apologies, i should have included a notebook. here is one in a gist, hopefully it opens well

https://gist.github.com/mosheliv/766e087ef327a38fd2c2ab33c430c4d1/raw/2b0a512a9d893d6b52d3eedf024500670db8c567/0_forecastintro.ipynb

On Tue, Oct 5, 2021, 14:42 Paul Kassianik @.***> wrote:

Could you elaborate on the colab pandas/statsmodels collisions that you're seeing? If I open a random colab notebook right now and try to play around with Merlion, am I bound to see these issues?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/salesforce/Merlion/issues/21#issuecomment-933990081, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC7IWC3F6YPZDLGLHSOX4K3UFJJZZANCNFSM5FKJTHJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mosheliv commented 2 years ago

I have managed to bypass this by cd to ts_datasets, which is not great. However, the seatle trail dataset does not have a download function and so the nbeats example fails.

aadyotb commented 2 years ago

@mosheliv thank you for surfacing this behavior. A solution to this issue is to use the following code to install:

! if [ ! -d Merlion ]; then git clone https://github.com/salesforce/Merlion; fi
!pip install -e Merlion/
!pip install -e Merlion/ts_datasets/

whereas you have been doing

! if [ ! -d Merlion ]; then git clone https://github.com/salesforce/Merlion; fi
! cd Merlion
! pip install -e .
! pip install -e ts_datasets/

It seems that Python gets confused and tries to import from the package from the ts_datasets directory, rather than ts_datasets/ts_datasets. You can resolve this issue by removing ts_datasets from the immediate path. In other words, do cd .. rather than cd ts_datasets.

Would you mind elaborating on the issue with the SeattleTrail dataset? I understand that there is no download() function, but we have the data in our repo, so you should be able to use the dataset without specifying a rootdir, if you install as suggested above.

mosheliv commented 2 years ago

Note that this is not how your installation instructions go... "You can install merlion from PyPI by calling pip install salesforce-merlion. You may install from source by cloning this repo, navigating to the root directory, and calling pip install ., or pip install -e . to install in editable mode. You may install additional dependencies for plotting & visualization via pip install salesforce-merlion[plot], or by calling pip install ".[plot]" from the root directory of this repo.

To install the data loading package ts_datasets, clone this repo, navigate to its root directory, and call pip install -e ts_datasets/. This package must be installed in editable mode (i.e. with the -e flag) if you don't want to manually specify the root directory of every dataset when initializing its data loader." I'll give it another try although I more or less went with another package just because it was easier to use.

Basically, it would be really fantastic if you could add to the repository a working example of the example notebooks on colab, as this is how people test new packages nowadays. Dockerfile is also fine, just something that works out of the box. I'll try to re-try nbeats on colab and will let you know if I encountered more problems.

On Tue, Oct 12, 2021, 08:09 Aadyot Bhatnagar @.***> wrote:

@mosheliv https://github.com/mosheliv thank you for surfacing this behavior. A solution to this issue is to use the following code to install:

! if [ ! -d Merlion ]; then git clone https://github.com/salesforce/Merlion; fi !pip install -e Merlion/ !pip install -e Merlion/ts_datasets/

whereas you have been doing

! if [ ! -d Merlion ]; then git clone https://github.com/salesforce/Merlion; fi ! cd Merlion ! pip install -e . ! pip install -e ts_datasets/

It seems that Python gets confused and tries to import from the package from the ts_datasets directory, rather than ts_datasets/ts_datasets. You can resolve this issue by removing ts_datasets from the immediate path.

Would you mind elaborating on the issue with the SeattleTrail dataset? I understand that there is no download() function, but we have the data in our repo, so you should be able to use the dataset without specifying a rootdir, if you install as suggested above.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/salesforce/Merlion/issues/21#issuecomment-940371328, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC7IWC2KK2IEIT53U2PDZRLUGMY6FANCNFSM5FKJTHJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mosheliv commented 2 years ago

If I'll get a colab notebook running well, I'll make a pull request. Same with docker. Your product looks very robust but the proof is in the pudding, as they say here.

On Tue, Oct 12, 2021, 08:28 Moshe Livne @.***> wrote:

Note that this is not how your installation instructions go... "You can install merlion from PyPI by calling pip install salesforce-merlion. You may install from source by cloning this repo, navigating to the root directory, and calling pip install ., or pip install -e . to install in editable mode. You may install additional dependencies for plotting & visualization via pip install salesforce-merlion[plot], or by calling pip install ".[plot]" from the root directory of this repo.

To install the data loading package ts_datasets, clone this repo, navigate to its root directory, and call pip install -e ts_datasets/. This package must be installed in editable mode (i.e. with the -e flag) if you don't want to manually specify the root directory of every dataset when initializing its data loader." I'll give it another try although I more or less went with another package just because it was easier to use.

Basically, it would be really fantastic if you could add to the repository a working example of the example notebooks on colab, as this is how people test new packages nowadays. Dockerfile is also fine, just something that works out of the box. I'll try to re-try nbeats on colab and will let you know if I encountered more problems.

On Tue, Oct 12, 2021, 08:09 Aadyot Bhatnagar @.***> wrote:

@mosheliv https://github.com/mosheliv thank you for surfacing this behavior. A solution to this issue is to use the following code to install:

! if [ ! -d Merlion ]; then git clone https://github.com/salesforce/Merlion; fi !pip install -e Merlion/ !pip install -e Merlion/ts_datasets/

whereas you have been doing

! if [ ! -d Merlion ]; then git clone https://github.com/salesforce/Merlion; fi ! cd Merlion ! pip install -e . ! pip install -e ts_datasets/

It seems that Python gets confused and tries to import from the package from the ts_datasets directory, rather than ts_datasets/ts_datasets. You can resolve this issue by removing ts_datasets from the immediate path.

Would you mind elaborating on the issue with the SeattleTrail dataset? I understand that there is no download() function, but we have the data in our repo, so you should be able to use the dataset without specifying a rootdir, if you install as suggested above.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/salesforce/Merlion/issues/21#issuecomment-940371328, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC7IWC2KK2IEIT53U2PDZRLUGMY6FANCNFSM5FKJTHJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

aadyotb commented 2 years ago

Thank you @mosheliv! We tested the installation on Mac and Linux environments directly, but hadn't thought to test them in Colab. I can update the installation instructions to be what I described above, as this is more robust overall. A Colab notebook example and/or Dockerfile from you would be very welcome as well.

aadyotb commented 2 years ago

PR #38 updates the installation instructions as discussed.