kaizen-ai / kaizenflow

KaizenFlow is a framework for Bayesian reasoning and AI/ML stream computing
GNU General Public License v3.0
111 stars 77 forks source link

Cross-exchange arbitrage CEX-CEX #2

Open ghost opened 1 year ago

ghost commented 1 year ago

Specs are at https://docs.google.com/document/d/1ELLDf7dg3nli6nLYMpQ9IxuTW5dYdN15nluNCZbZmD4/edit#heading=h.1cihk5d2qi72

gpsaggese commented 1 year ago

Assigned to @Chandramani05

Let's discuss the specs more in details in the gdoc

Chandramani05 commented 1 year ago

Ok

Chandramani05 commented 1 year ago

Do you mean the google doc in the above link ? https://docs.google.com/document/d/1ELLDf7dg3nli6nLYMpQ9IxuTW5dYdN15nluNCZbZmD4/edit#heading=h.1cihk5d2qi72

Chandramani05 commented 1 year ago

Hello Dr. Saggese

I have read the paper "Trading and Arbitrage in Cryptocurrency Markets" upto section 4.

I have summarized the important points in the attached PDF and will start the coding part of gathering the data provided by tomorrow.

Queries : A) I am still not clear about the main output of the model like what is the final goal we are trying to achieve here.

B) Is it OK if I use google colab for the project or you suggest a private IDE like pycharm ?

PS : I will try to complete the paper as soon as possible. Since I have to implement the paper I am trying to get a better understanding of it.

Let me know if any concerns

Thanks and Regards Chandramani _Project_CEX-CEX.pdf

Chandramani05 commented 1 year ago

I don't have permission to edit : https://drive.google.com/drive/u/0/folders/1KCU3Hiw3Vy_h_3wDHAn5naT6lF4SdelJ. Please help

ghost commented 1 year ago

Hello Dr. Saggese

I have read the paper "Trading and Arbitrage in Cryptocurrency Markets" upto section 4.

Very good start, Chandramani.

Can you pls copy paste this email in the https://github.com/sorrentum/sorrentum/issues/2 ? I’ll respond there in the same way.

We prefer GitHub to communicate, instead of emails, so that everything is nicely organized by topic, instead of emails.

I have summarized the important points in the attached PDF and will start the coding part of gathering the data provided by tomorrow.

Good job putting together a research paper. This is going to be useful when we want to publish the results.

Can you pls save your doc as a Google Document in your dir

https://drive.google.com/drive/u/0/folders/1KCU3Hiw3Vy_h_3wDHAn5naT6lF4SdelJ

So that we can discuss on the paper directly there.

Queries : A) I am still not clear about the main output of the model like what is the final goal we are trying to achieve here.

The final model will get real-time data (say minutely data) from all the exchanges, compute the dispersion, pick coins that are out of balance, and then trade in a way to reduce the imbalance, making profits.

This paper is really crude: we will add machine learning and lots of more complex stuff (like https://github.com/sorrentum/sorrentum/issues/3) on top of it. We have all the components already built and we just need the model.

The first goal is to get familiar with the data and reproduce what others have done.

B) Is it OK if I use google colab for the project or you suggest a private IDE like pycharm ?

Of course. You can save the Collab notebook in the same dir above.

If you need more computing power we can create an account on one of our servers.

PS : I will try to complete the paper as soon as possible. Since I have to implement the paper I am trying to get a better understanding of it.

Let me know if any concerns

Sounds good.

Let’s reach a checkpoint and we will do a sync in person.

Thanks and Regards Chandramani _Project_CEX-CEX.pdf

gpsaggese commented 1 year ago

I don't have permission to edit : https://drive.google.com/drive/u/0/folders/1KCU3Hiw3Vy_h_3wDHAn5naT6lF4SdelJ. Please help

Permissions changed. Can you pls check again?

Chandramani05 commented 1 year ago

Yes . I uploaded the report now. I will keep editing and saving the progress report there Thank you!!

gpsaggese commented 1 year ago

@Chandramani05 can you put the document as Google Doc in that dir instead of a PDF? In this way it's easier to collaborate since I can add comments and suggestions on the document directly (like we do for https://docs.google.com/document/d/1ELLDf7dg3nli6nLYMpQ9IxuTW5dYdN15nluNCZbZmD4/edit#heading=h.6a527al82waq).

Makes sense?

Chandramani05 commented 1 year ago

Yes Sorry about before. I have uploaded the Google Doc

gpsaggese commented 1 year ago

Adding @Addy-mufc

We'll coordinate for the work in the next office hours

Chandramani05 commented 1 year ago

What is the password/token to access the Jupyter notebook from docker?

gpsaggese commented 1 year ago

There should be no password running locally. Can you post a picture of your window?

Chandramani05 commented 1 year ago

Here it is : [

Screenshot 2023-02-12 at 7 02 19 PM

](url)

gpsaggese commented 1 year ago

This is weird since

> more docker_jupyter.sh
#!/bin/bash -xe

GIT_ROOT=$(git rev-parse --show-toplevel)

REPO_NAME=sorrentum
IMAGE_NAME=jupyter
FULL_IMAGE_NAME=$REPO_NAME/$IMAGE_NAME

docker image ls $FULL_IMAGE_NAME

CONTAINER_NAME=$IMAGE_NAME
docker run --rm -ti \
    --name $CONTAINER_NAME \
    -p 8888:8888 \
    -v $GIT_ROOT/sorrentum_sandbox:/data \
    $FULL_IMAGE_NAME \
    /data/devops/jupyter_docker/run_jupyter.sh

jupyter-notebook --port=8888 --no-browser --ip=0.0.0.0 --allow-root --NotebookApp.token='' --NotebookApp.password=''

so we are explicitly doing password-less login.

1) Can you post the output of when you run the container?

2) Can you try just to pressing enter for the password? Or try to set a new password?

3) You might have some cookie cached that is confusing the server.

If it doesn't work we can do a quick screen share and see what's the problem. I think there is something weird going on.

Chandramani05 commented 1 year ago

I will try to set up the password or any other method Here is the output :

(base) chandramaniyadav@ChandramanisMBP jupyter_docker % sh docker_jupyter.sh REPOSITORY TAG IMAGE ID CREATED SIZE sorrentum/jupyter latest c6fa3255a1ea 28 hours ago 997MB WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested

Chandramani05 commented 1 year ago

Solved It was because I was running another notebook in VS code. When I closed it, it authenticated without password

Chandramani05 commented 1 year ago

Progress Report and Queries :

  1. I have grouped each currency pair . There are 27 currency pair in the data. Check the image : Screenshot 2023-02-13 at 4 11 30 PM
  2. I also tried to plot the close price by timestamp on each currency pair but it doesn't seem very clear.
Screenshot 2023-02-13 at 4 11 43 PM

Issues :

  1. The data is too big and plotting data is taking too much time as well as application memory. Please Suggest some way that I can do the EDA in efficient manner ?
  2. Is this correct way to calculate triangular arbitrage of the currency_pair : for name, group in grouped: high_price = group["high"].max() low_price = group["low"].min() df.loc[group.index, "triangular_arbitrage"] = high_price / low_price

Let me know if I am approaching this wrong way.

Thanks!!

gpsaggese commented 1 year ago

Good work @Chandramani05

FYI you are using 1-min data and we are going to work with seconds / milliseconds data... so data is going to get much bigger. In any case, we have tools to handle the data size issues and we can get you a login on our AWS instances with 128GB of data so memory is not an issue. Let's keep playing with small data sets and then we graduate to monster ones.

1) If you keep all the data in a multi-index df (see discussion on #1 from @FirstSingularity) it's going to be fast to compute. Plotting 2 years of data at 1 minute resolution is always going to be super-slow. My suggestion is to compute vectorially and when plotting you sub-sample at hourly rate.

2) As to the triangular arbitrage things are a bit more complex.

3) For triangular arbitrage, I'll write down some notes on how to model it as an optimization problem. It is like looking for certain paths on a graph.

4) You can work on #33 which is the data that we will need to look for the arbitrage opportunities

jsmerix commented 1 year ago

@jsmerix is adding more exchanges in https://drive.google.com/drive/u/0/folders/1O9FCMY61LDUCjiT4nPBG9XPpCmbTsCFC

Updated the database of datasets, refer to the following gdrive folder to access them.

thejameszhang commented 1 year ago

I know that Chandramani has already done this, but I am having trouble creating the Parquet Dataset. I get this error.

cex_error

It could be because I'm not sure if I ran the Docker container correctly.

thejameszhang commented 1 year ago

I was trying to follow along with the directions in the readme file here https://github.com/sorrentum/sorrentum/tree/master/sorrentum_sandbox but I was never able to use airflow:airflow to log in to localhost:8090

gpsaggese commented 1 year ago

I was trying to follow along with the directions in the readme file here https://github.com/sorrentum/sorrentum/tree/master/sorrentum_sandbox but I was never able to use airflow:airflow to log in to localhost:8090

The link to the Docker container with Jupyter is https://github.com/sorrentum/sorrentum/blob/master/sorrentum_sandbox/devops/jupyter_docker/README.md

The link https://github.com/sorrentum/sorrentum/tree/master/sorrentum_sandbox is for a Sorrentum node that represents our infra. Did you find this link in the documentation? If so, pls let me know where so I can clarify

gpsaggese commented 1 year ago

I know that Chandramani has already done this, but I am having trouble creating the Parquet Dataset. I get this error.

It could be because I'm not sure if I ran the Docker container correctly.

I'll let @Chandramani05 help you, but I don't know if it's a problem with Windows (I see you are using that from your path) having issues reading the data for some crazy reason. Docker solves all these problems by providing a reproducible environment independently of OS.

You can read about it on https://github.com/gpsaggese/umd_data605/blob/main/lectures/02.2%20-%20Docker%20DevOps.pdf

Also @samarth9008 can help here. Samarth is the TA for DATA605.

samarth9008 commented 1 year ago

I know that Chandramani has already done this, but I am having trouble creating the Parquet Dataset. I get this error. cex_error

May be you need to Escape your path with double slash at "\\bulk". When you use \b python will interpret it as byte value.

gpsaggese commented 1 year ago

Good point @samarth9008. Another approach is to use r'\Users...' to prevent Python from interpreting special symbols.

Chandramani05 commented 1 year ago

@samarth9008 I am using the same code.

Screenshot 2023-02-20 at 9 22 47 AM

I run the code on my local computer. Let me try to run on the docker container and I will let you know if I get the same error.

gpsaggese commented 1 year ago

The problem is that for @thejameszhang outside the container is Windows and the paths use \ which confuses Python. Inside the container (and for Mac), paths use / which is not ambiguous.

Long story short: we should always use the container (and Windows just creates a bunch of avoidable problems)

Chandramani05 commented 1 year ago

It worked for me while running on the docker container also. I think we need to specify the correct path :

Screenshot 2023-02-20 at 9 42 43 AM
thejameszhang commented 1 year ago

Got it to work. Thanks guys! @Chandramani05 @Chandramani05 @samarth9008

thejameszhang commented 1 year ago
multiindex

why do values in the volume and vwap columns become nans when trying to make multi-index df?

thejameszhang commented 1 year ago

oh got it, should be np.array(example_df) not just example_df

thejameszhang commented 1 year ago
multi_merged

created the single df with the added exchange level

thejameszhang commented 1 year ago

i see i can't directly push to master, what is the best way to make a pull request for the notebook

gpsaggese commented 1 year ago

@samarth9008 can you pls help @thejameszhang do a PR?

Feel free to put together a short paragraph in https://docs.google.com/document/d/1QqNyFWUmkHjb22I5Od2BMUrapcSOhb3QZWWdm35vIu0/edit#heading=h.q9zx7ft1vyej on how to do this operations, so we can point people to the doc

The best approach is to write the gdoc and point @thejameszhang to it to see if it's clear.

samarth9008 commented 1 year ago

@thejameszhang

I have updated the word doc with guidlines on how to contribute to this project. Let me know if things are not clear to you.

thejameszhang commented 1 year ago

Thank you. Created the PR and assigned you guys just now

gpsaggese commented 1 year ago

The PR is https://github.com/sorrentum/sorrentum/pull/118