portfolio-performance / portfolio

Track and evaluate the performance of your investment portfolio across stocks, cryptocurrencies, and other assets.
http://www.portfolio-performance.info
Eclipse Public License 1.0
2.75k stars 576 forks source link

Generic HTML Price/Value Scraper #3968

Open jat255 opened 2 months ago

jat255 commented 2 months ago

Is your feature request related to a problem? Please describe.

Some assets have information about their value available online, but oftentimes it is not in a nicely structured JSON or HTML table, like is currently supported by PP, or there is no public API available. An example of this could be the current value of a piece of real estate (Zillow, Redfin, etc.), arbitrary funds that are not exchange traded, or perhaps some other physical asset (collectables, etc.).

Describe the solution you'd like

A useful addition to PP would be if there was an "HTML parser/scraper" quote feed that required a URL to fetch and a "selector" string (similar to how the JSON parser works). Possible query languages could be XPath, CSS selectors, or maybe something else. Since this would be primarily useful for ongoing quote fetching, I would expect that the date would be set to the current date when the quote price is fetched (as opposed to the HTML table tool, which requires that the date is explicitly stated).

Additional context

I am not an experienced Java programmer, but after a quick look, it appears there are some libraries that might provide this "HTML parsing" functionality:

This functionality is available in Ghostfolio, which uses the cheerio javascript library to accomplish this.

Morpheus1w3 commented 2 months ago

I like the idea to select the data like "table:eq(2) > tr > td:eq(1)") if the third table and second column is requested. And, JSOUP is already in use @ Portfolio Performance.

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;

public class TableCellSelector {
    public static void main(String[] args) {
        // Example URL containing a table
        String url = "http://example.com/table.html";

        // Jsoup to parse the HTML
        try {
            Document doc = Jsoup.connect(url).get();

            // Select all cells in the second column of the third table
            Elements cells = selectTableCellsInColumn(doc, "table:eq(2) > tr > td:eq(1)");
            for (Element cell : cells) {
                System.out.println(cell.text());
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static Elements selectTableCellsInColumn(Document doc, String selector) {
        return doc.select(selector);
    }
}
jat255 commented 2 months ago

@Morpheus1w3 that's good to hear there's already a library in use that could do this. I worked on setting up a development environment to see if I could hack something together, but this might be beyond my current skill set