portfolio-performance / portfolio

Track and evaluate the performance of your investment portfolio across stocks, cryptocurrencies, and other assets.
http://www.portfolio-performance.info
Eclipse Public License 1.0
2.79k stars 583 forks source link

anybody interested in PP with a database? #2216

Open MichaelMeier99 opened 3 years ago

MichaelMeier99 commented 3 years ago

Is your feature request related to a problem? Please describe. PP is wonderful. Never the less it is slow in opening and the data take much space

Describe the solution you'd like It would be a boost for performance and power to compact everything into a Firebird database. The data would load on scratch. The options to get information about our portfolio boost to limitless (SQL).

Additional context The time and code this complete change of data-storage is enourmous. I cannot handle it alone. And it will take a handful of beta-testers until we can dare to offer the code to our public. Is there anybody out there, who likes the project as well?

What I can give:

What I do not know how to at the moment

So please let me know 2 things for now: 1) Do you like the idea? 2) Are you willing and able to work with me?

RomanLangrehr commented 3 years ago

I really like the idea. I also think it would be useful to store historical open, high, low, close, volume for the (at the moment, PP only stores the close quote. High, low and volume are only stored for the most recent date and open is not stored/fetched) at all. They could be used to calculate, for example, maximum drawdown more accurately, and you could display the trading volumes (and candlestick charts instead of normal charts) in the chart tab.

However, PP stores a lot of different information in the xml file (GUI settings, transactions etc.) and thus I think "compact everything" in a SQL database would require a quite complex database scheme and might be less flexible (and a lot of work). So I think, at least for a starter, we should only put the historical quotes in a database.

Pnda87 commented 3 years ago

I also like the idea to separate the data into database tables: quotes, securities, transactions, .... and have it accessible through sql. GUI information for PP could stay in the xml file

it is definitely an enormous task to implement that - as you already said - but together with the rest api introduced with #2085 it would open up quite some interessting posibilities for the future..

debegr92 commented 3 years ago

There was also a comment in a Telegram chat, that it would be great to separate quotes and transactions. With this model, all securities with historical (OHLC+V) values can be stored very efficient in a SQLite file for example. Helpful for investors with more than one account (kids etc.) but the same stocks.

MichaelMeier99 commented 3 years ago

Thank you for the comments!

"Candlestick", Drawdowns, Runups... These are old wishes of mine. Without candlestick charts I feel blind. For this reason I have to click and click to check my stocks insider and outside PP. WOW, it would be great to have candles in PP finally.

So we have 2 topics or better 3: 1) tables for quotes 2) tables for trades. 3) tables for admin

to start with 1) tables for quotes I can generate SQL quode for generation of these tables, (at least I have a tool which says to do that. The tool is freeware and called Flamerobin. It is the great to work in Firebird. The question is: Shall we do it in Firebird? Or SQLLite? For the latter I have neither admin tools, nor any knowledge. Firebird and Flamerobin talk SQL dialect 3. So who perfers SQLLite should be able to translate the dialect 3 to a SQL, which SQLite unterstands.

The most important about tables is: Do not safe the dates within the quotes table,. To keep tenthousands of date-Vars for nothing is a burden. My tables will avoid this. Never the less they have a trick, which may be a problem: They only safe working days at the moment. For Saturday and Sunday there is no entry. Saves lots of entries in hundreds of series of 20 years. Never the less this must be KNOWN by ever programmer. Otherwise the error-messages are a problem.

The next thing is: What shall we have in the tables? My tables work for Futures. So they have open, high, low, close, volume, openinterest and a lot more for maturity and shortcuts which can work even before the year 2000 and the "millenium bug". Shall we take this over? I tend to "no". Shortcuts we have to look close at. Because the tables shall know different shortcuts besides the ISIN. Singapur has its own shortcuts, which are different for shortcuts for the same stock in Europe. Next question: Shall we work for options as well? Then we would need fields for put or call and strikes at least.

about 2) trades good news: I have tables for trades as well. Never the less, this may be too much for a first task. As I wrote all those tables and their transactions, - I know HOW MUCH work it is. My tables for trades contain lots of nice stuff, like risk-management, value-at-risk-scenarios and diversification und correllation. I must check all these for portability. I wrote them for myself and just use them by my own.

3) the admin shall slip to the database very soon. The password protection will work nicely. Never the less I agree, that the vage thoughts, - how much work there is behind all this - , causes me feelings of panic.

So first step is to decide for a database: SQLite or Firebird?

Who would be able to bring my SQL table generation codes into the db?

I can offer both:

RomanLangrehr commented 3 years ago

The most important about tables is: Do not safe the dates within the quotes table

Why? I think it is a bad idea not to safe the date. The public holidays differ from country to country so you easily end up having different gaps in the historical quotes for different securities/indices. It sounds not robust enough, when we don't store the date. And the difference in terms of file size should not be too big.

MichaelMeier99 commented 3 years ago

Roman, yes, agree. Can lead to troubles. Under the line the thing will be easier. Because we can fill the calender just day by day without checking for Saturday / Sunday.

But before we fill anything: Firebird or SQLite?

debegr92 commented 3 years ago

Save the dates as standard string with yyyy-mm-dd and use SQLite.

MichaelMeier99 commented 3 years ago

Roman, yes, agree. Can lead to troubles. Under the line the thing will be easier. Because we can fill the calender just day by day without checking for Saturday / Sunday.

But before we fill anything: Firebird or SQLite?

MichaelMeier99 commented 3 years ago

SQLite, - if somebody can use it here, - fine to me. I will provide the tables within today or tomorrow.

Standard string as date?! - May lead to project-terminating troubles: e.g. in the USA, the do not write "31.12.2021", but "12.31.2021". If you have a date from an USA-data provider wich reads "2.3.2021" - so what does it mean? The 2nd of March oder the 3rd of February?

Even Windows 10 has troubels with date as string, as they have the option 31.12.2021 alternativly 31.12.21. Costed me months to find, why one of my software elements has a zero instead of a date, - which was taken from Windows.

I suggest to use the informatic date-format, which are figures Julian's date. Every software can work with it and there is no ambiguity: Window, Unix, USA and Europe.

RomanLangrehr commented 3 years ago

We just have to pick the date format independent of the Locale. If we always use yyyy-mm-dd (as PP does right now), it should be fine. (And that format is not ambiguous)

MichaelMeier99 commented 3 years ago

Please help! I have no practise, how github works. I made 3 files and am not sure, how to upload them at the right place and if and how to start a new thread for them. Can somebody help me?

These 3 files are:

In words: The table tbkurszeilen contains the quotes and every line contains a foreign key at the stock (in table tbKontrakte) and the date. The table tbKalender contains all dats in Julian Format. The table tbKontrakte contains the basic information to each contract / stock.

Hope, somebody can work on with this!

Tabellen Basisfunktion als Grafik tbKalender_Fuellen SQLCode und Schleifen.txt

SQL am 16.4.2021.SQL.txt

MichaelMeier99 commented 3 years ago

Roman: We want to IMPROVE the performance and the needed space. None of this can be done by saving the date as string.

Good code works like this: Read the date from the database and convert it by a function from Julian to string, if needed. Any programming language has prefab as it is - state of the art. THEN we can pass it to PP.

And remember: We will offer candle-stick-charts in the Future. Charts drawing snippets work with SQL and SQL works with date of Julian. To destroy the slim function by a string-format is not a good idea. Just check how much BYTES these string would take compared with the double-value of the usual date format. When PP uses date strings at the moment instead of date - we shall change this as well. Maybe after this, it will load more quickly.

RomanLangrehr commented 3 years ago

I don’t think the date format will have a significant impact on the performance, we will have to convert them into java.time.LocalDates anyway. But I agree that a date format built in to SQL (or whatever database we will use) might be useful, so that we can, for example, query on the quotes within a certain time frame. And I am not an expert on SQL/databases, but why would we store the dates in the Julian calendar? I guess there could be plenty of problems arising from conversion between Julian and Gregorian calendar and I don't think anybody will have quotes earlier than 15th of October 1582 in PP. ;)

MichaelMeier99 commented 3 years ago

The format definitely will impact to the performance, with a HUGE impact.

Look here: http://docwiki.embarcadero.com/RADStudio/Sydney/de/Delphi-Datentypen A date as Julian has the type double and therefore 8 Bytes. A string has 16 bytes. And you need: YYYY-MM-DD ► 10 of them. So just one single day is 160 bytes instead of 8 bytes! This is a factor of incredible 20. Yes, I know, that there are verious ways of saving strings within bytes, but this is not the point. Even if the factor would "only" be 10 or 2, this means a lot for the performance. All those bytes wanted to be brought into the RAM and to be shifted around. INTERNALLY the Julian date is calculated all the time. For this reason the string date has to be brought every singe letter internally to Julian. No computer is able to figure dates correctly. Dates are not binary, not hexadecimal, but a Babylonian system using 60! (60 seconds = 1 minute). No computer can do this directly. So the string system will fill up the RAM AS WELL as it keeps the CPU busy - in converting internally to Julian.

We will not start with Jesus Christ, we start at the 1.1.1900, which is used all around at computers (Excel, Delphi, Databases,...) This is the astronomic format. Try an Excel sheet or Calc in Open Office. Force the date to the different view and you will see, that all works internally with this very format, I suggested. I thought Oracle has developed this. This Astronomic Juilan is a very old and standard since decades: https://support.microsoft.com/de-de/office/einf%C3%BCgen-von-julianischen-datumsangaben-funktionen-0c7fa6aa-daff-402e-9990-93a5b76ba018

So and now I have the killer argument smile: All my tables (which I uploaded above) and the SQL codes I can provide, will only work with this informatic date format.

============================

So this was my knowledge about databases and bytes and Julian. Now I need YOUR help! Here is my limit of ability. Please bring those tables into a database, which JAVA can interact. If this runs, I provide you with the further SQL snippets for it. As it really is a lot of work to extract them, I will wait with it, until I am sure, somebody can be this "missing link" into Java.

Have a great weekend!

ngosang commented 3 years ago

I think it will be great to store the data in a relational database. I don't have time to work on it but I can give you some advise:

MichaelMeier99 commented 3 years ago

Thank you for the useful hints. Unfortunately I only work with Firebird (just one file as well, great and slim database), and not SQLiite (never tried). People who prefer SQLite (I do not belong to them): If you prefer it, you have to do it.

Yes, I am aware, that thie is a complete change. Never the less I doubt, that anybody will have enough free time, to create a version with conversion option. This would be about triple of development from the code as we know it today. The whole conversion interface would have to be developed and tested. The additional transactions with the database are many, many, many. This project seems to me a manpower work of a year. Has anbyody a sabatical year? ;-)

I love your idea and it is ideal. Nether the less it looks unrealistic to me.

Realistic may be: a complete new version with a database. Who does not like it, shall use the old version. No conversion to xml data, no compability. Because this conversion is much more work, than grabbing the data newly from the source. Yes, data may be lost. But before all this complex software is ready, the person, who claimed his data, has sold the stock already or went broke.

The only thing, which shall be saved to the new version is the path of the quotes, e.g. to yahoo or the page-link. The database will store open, high, low, close and volume, the existing system just close. How will you explain this to the user, who tries to convert xml to database and back? After your donated a year, - they will hate you for this, what they do not undestand.

Database IMHO can only be realized by setting a complete version-break to old quotes in XML and start a new version. The new version will be a jet compared to a train.

As I do not know the project in the deep, I hope, that anybody knows, if there are elements affected, we have not thought about up to now.

MichaelMeier99 commented 3 years ago

PS: If you have not enough time for this huge project: Do you have the time to set up the new branch with my already above posted SQL codes? I have no experience at github. And so the chance my first attempt will be a match, is very low.

chrisaut commented 3 years ago

Nobody will setup a branch for you just to experiment, and even if they did you wouldn't have write access anyways. Create a fork of the project, then you can branch locally and invite people to contribute on your fork, and eventually send PRs upstream.

Just dumping a bunch of SQL scripts from another project, telling people not to change anything and dismissing any discussion right out of hand because "you've never tried it, and you prefer this" so apparently you know best anyways, is unlikely to get you anywhere. For example the suggestion about an interchangeable persistence repository model is an excellent one, yet you dismiss it, seemingly either because you don't understand the concept and why it was suggested or just have another way to do it in your head already but then you say yourself you don't know the project itself "in the deep", so who knows. The option to store more than closing price is also completely orthogonal to the persistence model, no idea why you bring that up.

Anyways, I look forward to your PR and wish you all the best

ngosang commented 3 years ago

I agree with the previous comment. You don't have enough expertise to implement this change. Not yet at least.

This feature is really useful and it will be implemented at some point because XML format does not scale. We will have to wait until someone with more expertise and free time can contribute.

pfalcon commented 1 year ago

Some further timeline on this matter:

This ticket was opened 2021-04-14, and on 2021-08-01 the PP author announced that he works on ProtoBug saving format: https://github.com/buchen/portfolio/issues/2363 . Of course, the question of SQL database backend was immediately raised, and anybody interested in that matter, should read that ticket for discussion.

pfalcon commented 1 year ago

To revive the discussion here a bit, just as mentioned in the previous comments, adding DB backend would be an effortful task. First of all, it's worth considering whether it (supporting DB) is (still) needed at all.

For example, this ticket starts with:

Never the less it is slow in opening and the data take much space

So, those matters should by now be resolved by the ProtBuf format of https://github.com/buchen/portfolio/issues/2363 .

I still think storing data in a DB is useful, but for different reasons. These reasons are best presented by @buchen, the author of PP (from https://github.com/buchen/portfolio/issues/2363#issuecomment-903223593):

My motivation is: users are only willing to input so much data if they can fully extract the data later on. CSV and similar exports can provide part of the data, yes, but the XML is the full picture.

That's definitely insightful, and I cheer for users who want to extract data from PP. The users I know however want to interoperate with PP. Because for the things PP does, it's really great. It's unrealistic to find an alternative and doesn't make a sense to make one. But for things it doesn't, ... well, it's not. And it's either not realistic to implement all the missing features - life's too short for Java. How it should be is that interested users should be able to implement external tools for that. And XML isn't exactly the format allowing the maximum flexibility of that, but in PP case, by using strange defaults of Java libraries, which for some reasons are optimized for blackbox-persistence, instead of interoperability, it's even more complicated: https://github.com/buchen/portfolio/issues/3417

Putting that all together, I guess the path to attack this problem could be:

  1. Write a set of tools which parse and allow for round-trip of PP XML files. 1.1. Take a detour to understand what and why is wrong with PP XML format, and if anything could be improved there https://github.com/buchen/portfolio/issues/3417.
  2. Take p.1 and make them store/take data to/from a DB.

When a roundtrip thru a DB will be achieved, there will be a pretty finalized DB schema. Roundtrip will ensure that the data handling thru the DB is robust. And all that already would ensure interoperability with 3rd-party tools. Then last step may be integration of DB support into PP (or not).

pfalcon commented 1 year ago
  1. Write a set of tools which parse and allow for round-trip of PP XML files.

Of course, would be nice to reuse prior art. Some existing libs for reading PP XML files that I found (I found none for writing):

There were also a couple of Java/Kotlin libs, but those are cheats as far as I can tell, just calling out to XStream lib.

None of the above give me impression of doing one thing (parse PP XML) well, but YMMV...

  1. Take p.1 and make them store/take data to/from a DB.

In https://github.com/buchen/portfolio/issues/2363#issuecomment-903223593, it's mentioned:

BTW, @tfabritius is working on syncing the PP data into a database. Currently, it is a one-way sync: from the file into the database via a Portfolio Report API.

Can't find any code for that, and from a quick description again doesn't seem like the aim is to parse PP XML into a DB (instead some module of www.portfolio-report.net)

buchen commented 1 year ago

Can't find any code for that, and from a quick description again doesn't seem like the aim is to parse PP XML into a DB

Thomas does use an API to sync to a server (which then has a database in the backend). To my knowledge, this is not productively used. The code is behind the "experimental features" flag.

pfalcon commented 11 months ago

Write a set of tools which parse and allow for round-trip of PP XML files.

I've been working on this since that comment, and actually have it working for a couple of months. I still wanted to do more refactoring, that's why I didn't release it, but I keep not having any extra time, so well, here it is: https://github.com/pfalcon/ppxml2db (I still will refactor it as time permits, including git rebases). I was able to achieve almost perfect round-tripping of the current XML format, despite the issues described in #3417. The only case where it doesn't work is with some container elements where PP is not consistent of not having a container at all vs having it empty in different cases. Notably, that applies to security events.

pfalcon commented 9 months ago

https://github.com/pfalcon/ppxml2db

Now even with README.