scrapinghub / extruct

Extract embedded metadata from HTML markup
BSD 3-Clause "New" or "Revised" License
858 stars 113 forks source link

JSONDecodeError: Expecting value: line 1 column 1 (char 0) for URL https://www.drogaraia.com.br/nivea-desodorante-aerosol-deep-original-150ml.html #174

Open TiagoGoddard opened 3 years ago

TiagoGoddard commented 3 years ago

Hello,

Using the CLI version I get the following error message when using the json-ld:

extruct "https://www.drogaraia.com.br/nivea-desodorante-aerosol-deep-original-150ml.html" --syntaxes json-ld
Failed to extract json-ld, raises Expecting value: line 1 column 1 (char 0)
Traceback (most recent call last):
  File "C:\Users\Tiago\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\extruct\_extruct.py", line 108, in extract
    output[syntax] = list(extract(document, base_url=base_url))
  File "C:\Users\Tiago\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\extruct\jsonld.py", line 25, in extract_items
    return [
  File "C:\Users\Tiago\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\extruct\jsonld.py", line 25, in <listcomp>
    return [
  File "C:\Users\Tiago\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\extruct\jsonld.py", line 38, in _extract_items
    data = jstyleson.loads(HTML_OR_JS_COMMENTLINE.sub('', script),strict=False)
  File "C:\Users\Tiago\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\jstyleson.py", line 123, in loads
    return json.loads(dispose(text), **kwargs)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1264.0_x64__qbz5n2kfra8p0\lib\json\__init__.py", line 359, in loads
    return cls(**kw).decode(s)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1264.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1264.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
{
  "status": "200 OK",
  "url": "https://www.drogaraia.com.br/nivea-desodorante-aerosol-deep-original-150ml.html"
}

The JSON-LD of this site is:

{
    "@context": "https:\/\/schema.org\/",
    "@type": "Product",
    "name": "Desodorante Aerosol Antitranspirante Nivea Men Deep Original Carv\u00e3o Ativado com 150ml",
    "image": [
        "https:\/\/img.drogaraia.com.br\/catalog\/product\/d\/e\/desodorante_aerosol_nivea_men_deep_original_150ml4005900707536_1__1.jpg?width=265&height=265&quality=85&type=resize"
    ],
    "sku": [
        "74312"
    ],
    "description": "O que &amp;eacute;:&amp;nbsp;Antitranspirante Nivea Men Deep possui f&amp;oacute;rmula eficaz, com Carv&amp;atilde;o Ativado que atua poderosamente contra bact&amp;eacute;rias e proporciona uma fragr&amp;acirc;ncia moderna e masculina.. Dermatologicamente testado. Para que serve:&amp;nbsp;Nivea Men Deep Antitranspirante com prote&amp;ccedil;&amp;atilde;o de 48 horas, proporcionando uma sensa&amp;ccedil;&amp;atilde;o duradoura de pele limpa e seca, exatamente como ap&amp;oacute;s o banho. O poder do carv&amp;atilde;o ativado em uma fragr&amp;acirc;ncia masculina de longa dura&amp;ccedil;&amp;atilde;o. Sensa&amp;ccedil;&amp;atilde;o de frescor e limpeza. Modo de usar:&amp;nbsp;Usar somente nas axilas com pelo menos 15cm de dist&amp;acirc;ncia. N&amp;atilde;o usar se a pele estiver irritada ou lesionada.",
    "gtin13": "4005900707536",
    "brand": {
        "type": "Brand",
        "name": "Nivea Men"
    },
    "review": [
        {
            "@type": "Review",
            "reviewRating": {
                "@type": "Rating",
                "ratingValue": "4.00",
                "bestRating": "5.00"
            },
            "author": {
                "@type": "Person",
                "name": "NORMA R."
            }
        },
        {
            "@type": "Review",
            "reviewRating": {
                "@type": "Rating",
                "ratingValue": "5.00",
                "bestRating": "5.00"
            },
            "author": {
                "@type": "Person",
                "name": "ROGERIO C."
            }
        },
        {
            "@type": "Review",
            "reviewRating": {
                "@type": "Rating",
                "ratingValue": "5.00",
                "bestRating": "5.00"
            },
            "author": {
                "@type": "Person",
                "name": "William P."
            }
        }
    ],
    "aggregateRating": {
        "@type": "AggregateRating",
        "ratingValue": "4.70",
        "reviewCount": 3
    },
    "offers": {
        "@type": "Offer",
        "url": "https:\/\/www.drogaraia.com.br\/nivea-desodorante-aerosol-deep-original-150ml.html",
        "priceCurrency": "BRL",
        "price": "14.90",
        "priceValidUntil": "2021-04-10",
        "itemCondition": "http:\/\/schema.org\/NewCondition",
        "availability": "http:\/\/schema.org\/InStock"
    }
}

I cound't find the exact reason it's failing, since I created the following page with only the JSON-LD in the HTML, and the CLI worked:

<script type="application/ld+json">
    {
    "@context": "https:\/\/schema.org\/",
    "@type": "Product",
    "name": "Desodorante Aerosol Antitranspirante Nivea Men Deep Original Carv\u00e3o Ativado com 150ml",
    "image": [
        "https:\/\/img.drogaraia.com.br\/catalog\/product\/d\/e\/desodorante_aerosol_nivea_men_deep_original_150ml4005900707536_1__1.jpg?width=265&height=265&quality=85&type=resize"
    ],
    "sku": [
        "74312"
    ],
    "description": "O que &amp;eacute;:&amp;nbsp;Antitranspirante Nivea Men Deep possui f&amp;oacute;rmula eficaz, com Carv&amp;atilde;o Ativado que atua poderosamente contra bact&amp;eacute;rias e proporciona uma fragr&amp;acirc;ncia moderna e masculina.. Dermatologicamente testado. Para que serve:&amp;nbsp;Nivea Men Deep Antitranspirante com prote&amp;ccedil;&amp;atilde;o de 48 horas, proporcionando uma sensa&amp;ccedil;&amp;atilde;o duradoura de pele limpa e seca, exatamente como ap&amp;oacute;s o banho. O poder do carv&amp;atilde;o ativado em uma fragr&amp;acirc;ncia masculina de longa dura&amp;ccedil;&amp;atilde;o. Sensa&amp;ccedil;&amp;atilde;o de frescor e limpeza. Modo de usar:&amp;nbsp;Usar somente nas axilas com pelo menos 15cm de dist&amp;acirc;ncia. N&amp;atilde;o usar se a pele estiver irritada ou lesionada.",
    "gtin13": "4005900707536",
    "brand": {
        "type": "Brand",
        "name": "Nivea Men"
    },
    "review": [
        {
            "@type": "Review",
            "reviewRating": {
                "@type": "Rating",
                "ratingValue": "4.00",
                "bestRating": "5.00"
            },
            "author": {
                "@type": "Person",
                "name": "NORMA R."
            }
        },
        {
            "@type": "Review",
            "reviewRating": {
                "@type": "Rating",
                "ratingValue": "5.00",
                "bestRating": "5.00"
            },
            "author": {
                "@type": "Person",
                "name": "ROGERIO C."
            }
        },
        {
            "@type": "Review",
            "reviewRating": {
                "@type": "Rating",
                "ratingValue": "5.00",
                "bestRating": "5.00"
            },
            "author": {
                "@type": "Person",
                "name": "William P."
            }
        }
    ],
    "aggregateRating": {
        "@type": "AggregateRating",
        "ratingValue": "4.70",
        "reviewCount": 3
    },
    "offers": {
        "@type": "Offer",
        "url": "https:\/\/www.drogaraia.com.br\/nivea-desodorante-aerosol-deep-original-150ml.html",
        "priceCurrency": "BRL",
        "price": "14.90",
        "priceValidUntil": "2021-04-10",
        "itemCondition": "http:\/\/schema.org\/NewCondition",
        "availability": "http:\/\/schema.org\/InStock"
    }
}
</script>
TiagoGoddard commented 3 years ago

I tested with the python lib and had the same issue

markavale commented 3 years ago

Same Issue, what could be the possible solution for this?