postlight / parser

📜 Extract meaningful content from the chaos of a web page
https://reader.postlight.com
Apache License 2.0
5.4k stars 442 forks source link

[help] Your gitter is dead so I'm forced to post here #620

Open working-name opened 3 years ago

working-name commented 3 years ago

I don't get what I don't understand here: https://github.com/postlight/mercury-parser/blob/master/src/extractors/custom/README.md#using-transforms

Mercury proceeds to return everything as if there's no transforms. My code:

      accreditations: {
        selectors: ['#accreditations'],
        allowMultiple: true,
        clean: [
          'div.description.photos', 
          'div.review-stars', 
          'img.edit', 
          'span.facility-reviews', 
          'reviews-container-mobile',
          'reviews-container',
          '.map-placeholder'
        ],

        transforms: {
            /**
             * This doesn't work. It's essentially the same as https://github.com/postlight/mercury-parser/blob/master/src/extractors/custom/README.md#using-transforms but mercury blows right through. Guess I have to cleanup after it - renders this tool fairly useless.
             */
          p: node => {
            const children = node.children(0);
            lilkid = children._root.get(0).children[0].data;
            // console.log(lilkid);

            if( lilkid.match(/verified in|Member:|Accreditation:/) ) {
                return lilkid;
            }

            return null;
          },
        },
      },