Open smartmimo opened 2 years ago
Hi! That's a very good question! Unfortunately, reliably extracting price and currency from the price element's string is a hard problem itself and goes beyond the scope of our work on this problem. So I can't really offer any insights unfortunately. Would love to hear if workable solutions exist for this problem! Best wishes, Stefan
Hello, I'm trying to parse the data from HTML pages, I can extract everything but I have an issue with the price.
Sometimes the Price element can contain more than just the real price, the element contains some other text for example price before discount or some numbers like 50% discount, or in other cases numbers like: 20e / 100ml - 30e / 200ml. Generally it may contain numbers next to the original price.
An example of this can be found in this entry:
./data/test/AT/www.mainzoo.de/8679/source.mhtml
Not that I think it's relevant but I'm using NodeJS (cheerio) to do the parsing, here's the function for my price extraction:
It reads the text of the
[klarna-ai-label = "Price"]
element and splits it with spaces, filters the ones that start with = as they are URL encoded elements from the MHTML extension, and then executes theextractPriceFromString
function on the array, that way we get an array with all the numbers (possible prices) in the element.Normally, one would think a simple
const price = extractPriceFromString($('[klarna-ai-label = "Price"]').text())
would suffice but it's not the case here.To sum up, I'm asking if anybody else has the same problem, and maybe if you have an idea how to get the real price from the pages that have this inconvinence.