salesforce / Merlion

Merlion: A Machine Learning Framework for Time Series Intelligence
BSD 3-Clause "New" or "Revised" License
3.36k stars 295 forks source link

Merlion dashboard app #129

Closed yangwenzhuo08 closed 1 year ago

yangwenzhuo08 commented 1 year ago

This PR implements a web-based visualization dashboard for Merlion. Users can get it set up by installing Merlion with the optional dashboard dependency, i.e. pip install salesforce-merlion[dashboard]. Then, they can start it up with python -m merlion.dashboard, which will start up the dashboard on port 8050. The dashboard has 3 tabs: a file manager where users can upload CSV files & visualize time series; a forecasting tab where users can try different forecasting algorithms on different datasets; and an anomaly detection tab where users can try different anomaly detection algorithms on different datasets. This dashboard thus provides a no-code interface for users to rapidly experiment with different algorithms on their own data, and examine performance both qualitatively (through visualizations) and quantitatively (through evaluation metrics).

We also provide a Dockerfile which runs the dashboard as a microservice on port 80. The Docker image can be built with docker build . -t merlion-dash -f docker/dashboard/Dockerfile from the Merlion root directory. It can be deployed with docker run -dp 80:80 merlion-dash.

aadyotb commented 1 year ago

@yangwenzhuo08 thanks for your changes! This looks great. I've finished what you started in terms of restructuring the module. Now, merlion.dashboard is fully integrated into Merlion itself. The dashboard's dependencies have been added as optional requirements in setup.py, so the user can install the dashboard with pip install salesforce-merlion[dashboard]. The user may manually start up the dashboard with python -m merlion.dashboard, or from Unicorn with gunicorn -b 0.0.0.0:80 merlion.dashboard.sever:server. Additionally, the dashboard is now able to handle exogenous regressors.

In terms of my original comments, can you add the documentation I requested previously? Besides, this, I have a couple new requests.

  1. Would it be possible for you to unify the train/test interface for anomaly detection and forecasting? I think both tasks should allow the user to either (a) upload separate train/test files, or (b) upload a single file and choose a train/test split.
  2. Can you allow max_forecast_steps = None to be a valid specification? It's actually the default setting for most models and is necessary for long-horizon forecasting.
yangwenzhuo08 commented 1 year ago

@aadyotb Thanks for the revision. For the forecasting tab, we can split train file and test file as the anomaly tab does. Well, to combine these two UIs (upload two files, upload a single file with a split fraction), I'm not sure what layout is better for it. Do you have suggestion on the UI design for this part? For forecasting, it may be straightforward, e.g., we have two dropdown lists, one for train file, the other for test file. And then we have a slider to set the split fraction which is used to split the training data into "train" and "validation". But for anomaly detection, such split has a problem when the number of labels is small, i.e., it is possible that the split validation dataset has no anomalies.

aadyotb commented 1 year ago

@yangwenzhuo08 I envision something like the following: you can have a radio box which can select "use same file for train/test" or "use separate test file". If you select "use same file for train/test", you get the slider where you specify the train/test fraction. If you select "use separate test file", you get a prompt to choose the test file. If you specify "use separate test file", the module should throw an error if the test data is not given. What do you think?

And in terms of anomaly detection, it's kind of a well-known issue that the labels are sparse. The evaluation metrics are implemented in such a way that they have reliable fallback options if there are no true positives present in the data. Maybe you can use the plot_anoms helper function in merlion.plot to plot the ground truth anomalies (if they are specified), and then also report the evaluation metrics on both train and test?

yangwenzhuo08 commented 1 year ago

So the layout is like this:

  1. A radio button to select "single file" or "separate"
  2. A dropdown list to select the train file
  3. If "single" is select, it shows a slider to set split ratio. If "separate" is select, it shows a dropdown list for choosing the test file. Is this OK?
aadyotb commented 1 year ago

Yes, this sounds good.