webcompat / webcompat.com

Source code for webcompat.com
https://webcompat.com
360 stars 191 forks source link

Access issues by domain names (Atom feed) #132

Closed karlcow closed 6 years ago

karlcow commented 10 years ago

It can be useful for a Web site to be able to know the status of all issues impacting a domain name.

Searching by domain names example.org should give the list of all the issues related to this domain name.

(added on March 2017)

miketaylr commented 9 years ago

This is our RSS feature.

hallvors commented 9 years ago

788 will help

miketaylr commented 8 years ago

No published branches yet, but @karlcow has a prototype in progress on his laptop. Assigning to him.

miketaylr commented 8 years ago

Closing https://github.com/webcompat/webcompat.com/issues/60#issuecomment-226489939 as a dupe of this.

karlcow commented 7 years ago

Preserving some things I had done for #60 So I can delete my local branch.

for webcompat/views.py

@app.route('/feeds/<domain_name>')
def domain_feed(domain_name):
      '''Route to display a feed for a domain name.

      - domain_name would be `mozilla.org`.
      - should make a search of all titles, numbers, latest comment date
      '''
      # User is probably not necessary here.
      if g.user:
          get_user_info()
      # Searching the domain_name and return a JSON with relevant data
      # to be defined in helpers
      domain_data = feed_summary(domain_name)
      return render_template('feed.atom', domain_data)
karlcow commented 7 years ago

Made the first comment more descriptive with the list of things to do.

karlcow commented 7 years ago

Note to self (it will grow with time):

There are a couple of ways to do that and to explore. I need to explore the impact about choices for them and the likelihood of creating a performance impact on the application.

Some possibilities:

  1. Generate the feed through a search query each time there is a request for the feed.
    • Pro: Information always fresh
    • Con: A search query is created at each request. Even with caching information, feed reader apps are not very respectful of HTTP best practices. So they will hit the server every time. That might exhaust our search rate limit.
  2. Generate a static feed at first request once an hour or once a day. Deliver the static file for each subsequent request. There might even be Flask extension doing already that. to search
    • Pro: Cache/Performance friendly. We have in cache only the domain name feeds which have been requested and not all domains names.
    • Con: Information age == defined by the cache we are creating (1 day old for example)
  3. Generate once a day with a cron, feeds for every known domains we currently have on webcompat.com
    • Pro: Cache/Performance friendly.
    • Con: same issues than 2. and having useless feeds kept around.

Some additional issues:

Some possible dependencies/information:

karlcow commented 7 years ago

Ahaha. Brace for impact and its controversy. https://twitter.com/search?f=tweets&vertical=default&q=http%3A%2F%2Fjsonfeed.org%2F&src=typd

karlcow commented 7 years ago

Let's start the experiment. Code! 🚨 And we will see if we have to throw everything. 🗑

karlcow commented 7 years ago

The main thing I will be experimenting is the creation of static files either generated on first requests or based on a cron.

karlcow commented 7 years ago

There is a feed feature in Werkzeug to keep in mind. http://flask.pocoo.org/snippets/10/

karlcow commented 7 years ago

Dumping ideas. Notebook style. 📓

🐍 pseudo-code

@feeds.route('/<domain>', methods=['GET'])
def domain_feed(domain):
    """Serve a feed for a specific domain name."""
    # Have we handled this domain already?
    if is_known_domain(domain):
        # Do we have a static atom feed file for it?
        if not is_static_feed(domain):
            # Let's create the feed in data/feed/
            create_feed(domain)
        # we can serve the feed to users.
        return serve_domain_feed(domain)
    else:
        # if we don't know anything we return 404
        return (
            '{domain} has no feed'.format(domain=domain),
            404,
            {'Content-Type': 'text/plain'})

I want to minimize the impact of bad feed readers. No matter how much caching you set on feed resources, many feed readers ignore it, and request every couple of minutes. So to avoid to generate a feed each time, I want to serve a static file that we generated on the first request.

Another benefit is we get files only for domains that people are interested in.

An interesting question will come up with updating, but let's say it's an issue we have to deal with later.

Some issues 🚨

karlcow commented 7 years ago

data quality is interesting… From a dump I have of all the issues from July 2017. Around 7920 issues. the domain names… are not always here or bogus or with irregular patterns.

I found so far 370 issues with bogus domains. I tried to cover as much as possible the possible patterns.

title is the issue title, so something ala www.nytimes.com - desktop site instead of mobile site

Current version. Will evolve.

def extract_domain_name(title, issue_number):
    """Extract the domain name from the title string."""
    # a domain name doesn't contain space
    candidate = title.split(' ', 1)[0]
    # domain names are lower cases
    candidate = candidate.lower()
    # a domain name contains at least one "."
    if '.' not in candidate:
        return 'BOGUS', issue_number
    # Tuple of bogus pattern to check against
    bogus_start_patterns = ('resource://', 'file://', 'chrome://')
    if candidate.startswith(bogus_start_patterns):
        return 'BOGUS', issue_number
    # it contains a domain name.
    if candidate.startswith('view-source:'):
        candidate = candidate.split('view-source:')[1]
    if ':' in candidate and not candidate.startswith('http'):
        candidate = candidate.split(':')[0]
        candidate = 'http://{}'.format(candidate)
    # some issues starts with http, we will clean up.
    if candidate.startswith('http://') or candidate.startswith('https://'):
        candidate = urlparse.urlsplit(candidate).netloc
        candidate = candidate.split(':')[0]
    # some domains with a path
    if '/' in candidate:
        candidate = candidate.split('/')[0]
    # some bogus domain with &
    if '&' in candidate:
        candidate = candidate.split('&')[0]
    # Handling local domains
    local_patterns = ('10.', '127.0.0.1', '192.168.', '172.')
    if candidate.startswith(local_patterns):
        return 'BOGUS', issue_number
    # return issue_number, candidate.encode('utf-8')
    # return candidate.encode('utf-8'), title
    return candidate.encode('utf-8')

Some of them that I'm fixing on the fly have an opportunity to be fixed once and for all. I could spill out a FIXME for those, so the data quality improves for the next run.

I will re-run it soon with a fresh issue dump.

There are still some issues where the domain name is different from URL: in the body. I can probably create an additional check to extract these and compare.

This is just for dumping a DB of domain names to generate feeds, but could be ultimately reuse in normalizing the data we receive from people.

 259 www.youtube.com
 125 www.facebook.com
 112 www.google.com
 110 vk.com
  76 www.netflix.com
  76 m.youtube.com
  65 web.whatsapp.com
  62 m.facebook.com
  55 webcompat.com
  53 addons.mozilla.org
  47 www.coco.fr
  40 twitter.com
  34 music.yandex.ru
  33 www.mozilla.org
  33 s0.2mdn.net
  32 www.twitch.tv
  32 support.mozilla.org
  32 mail.google.com
  27 github.com
  21 www.reddit.com
  20 www.amazon.com
  20 mega.nz
  19 www.pandora.com
  19 www.amazon.in
  19 play.google.com
  18 www.amazon.de
  18 mobile.twitter.com
  17 www.amazon.co.jp
  17 apps.facebook.com
  16 www.hulu.com
  16 www.bing.com
  15 www.primevideo.com
  15 www.linkedin.com
  15 radio.garden
  15 accounts.google.com
  14 www.yahoo.com
  14 outlook.live.com
  13 mailmanager.cityweb.de
  13 inbox.google.com
  13 imgur.com
  13 docs.google.com
  13 chaturbate.com
  12 www.theverge.com
  12 www.nasa.gov
  12 www.google.co.in
  12 video.corriere.it
  12 sj.myie9.com
  12 g1.globo.com
  12 drive.google.com
  12 developer.apple.com

Google properties:

 112 www.google.com
  32 mail.google.com
  19 play.google.com
  15 accounts.google.com
  13 inbox.google.com
  13 docs.google.com
  12 www.google.co.in
  12 drive.google.com
  11 google.com
   7 www.google.ca
   7 news.google.com
   5 www.google.ro
   5 www.google.fr
   5 support.google.com
   5 images.google.com
   4 www.google.com.mx
   4 www.google.co.uk
   4 translate.google.com
   4 tpc.googlesyndication.com
   4 plus.google.com
   4 hangouts.google.com
   4 fonts.google.com
   3 www.google.se
   3 www.google.it
   3 www.google.com.br
   3 www.google.co.jp
   3 groups.google.com
   3 developers.google.com
   2 www.googleadservices.com
   2 www.google.ru
   2 www.google.de
   2 www.google.com.pk
   2 www.google.com.eg
   2 voice.google.com
   2 santatracker.google.com
   2 photos.google.com
   2 news.google.co.in
   2 keep.google.com
   2 insideabbeyroad.withgoogle.com
   2 gmail.google.com
   2 calendar.google.com
   1 www.google.sk
   1 www.google.pt
   1 www.google.me
   1 www.google.hu
   1 www.google.es
   1 www.google.com.vn
   1 www.google.com.ua
   1 www.google.com.sa
   1 www.google.com.co
   1 www.google.com.bd
   1 www.google.co.th
   1 www.google.co.id
   1 www.google.ch
   1 www.google.bg
   1 www.drive.google.com
   1 trends.google.com
   1 translate.googleusercontent.com
   1 translate.google.ro
   1 translate.google.co.kr
   1 testmysite.thinkwithgoogle.com
   1 svg-edit.googlecode.com
   1 streetart.withgoogle.com
   1 storage.googleapis.com
   1 sites.google.com
   1 scholar.google.com
   1 r4---sn-4g5edn7s.googlevideo.com
   1 r3---sn-gwpa-itqd.googlevideo.com
   1 r2---sn-4g5edned.googlevideo.com
   1 productforums.google.com
   1 privacy.google.com
   1 opensource.google.com
   1 news.google.com.tw
   1 news.google.com.br
   1 myaccount.google.com
   1 googleweblight.com
   1 google.co.in
   1 enterprise.google.com
   1 encrypted.google.com
   1 earth.google.com
   1 console.cloud.google.com
   1 com.google
   1 codelabs.developers.google.com
   1 chrome.google.com
   1 books.google.de
   1 books.google.ca
   1 apps.google.com
   1 analytics.googleblog.com
karlcow commented 7 years ago

Do we create a feed when there is no valid issue associated with this domain?

karlcow commented 7 years ago

Ah … crap…

Once the BOGUS title removed, we have quite a lot of differences in between titles and URL. And a lot of recent issues. That comes from softvision not entering the same domain for the title and the URL. I think they fixed it after I mentioned it, but I didn't realize we had so many bad ones.

I need to fix this. Automatically if I prepare well the data. 😭

Below (issue_number, title_domain, URL_domain)

(1005, 'jal.co.jp', 'sp5971.jal.co.jp')
(1052, 'excite.co.jp', 'a.excite.co.jp')
(1053, 'excite.co.jp', 'a.excite.co.jp')
(1083, 'btv.cat', 'www.btv.cat')
(110, 'webcrawler.com', 'www.webcrawler.com')
(1139, 'lastampa.it', '')
(1145, 'smo.suumo.jp', 'smp.suumo.jp')
(1161, 'bosch-home.pl', 'www.bosch-home.pl')
(1182, 'video.gazzetta.it', '')
(1183, 'video.gazzetta.it', '')
(1184, 'sportmediaset.mediaset.it', '')
(1185, 'video.corriere.it', '')
(1242, 'menshealth.com', 'www.menshealth.com')
(1257, 'menshealth.com', 'www.menshealth.com')
(1267, 'womenshealthmag.com', 'www.womenshealthmag.com')
(1285, 'm.facebook.com', 'spam-removed')
(1301, 'menshealth.com', 'www.menshealth.com')
(139, 'moleskine.com', 'www.moleskine.com')
(1409, 'webcompat.com', 'support.mozilla.org')
(141, 'virginamerica.com', 'www.virginamerica.com')
(1528, 'docs.google.com', 'goo.gl')
(1591, 'www.facebook.com', 'spam-removed')
(1592, 'www.facebook.com', 'spam-removed')
(1593, 'www.facebook.com', 'spam-removed')
(1595, 'www.facebook.com', 'spam-removed')
(1596, 'www.facebook.com', 'spam-removed')
(1597, 'www.facebook.com', 'spam-removed')
(1598, 'www.facebook.com', 'spam-removed')
(1601, 'www.facebook.com', 'spam-removed')
(1602, 'www.facebook.com', 'spam-removed')
(1603, 'www.facebook.com', 'spam-removed')
(1604, 'www.facebook.com', 'spam-removed')
(1605, 'www.facebook.com', 'spam-removed')
(1611, 'www.facebook.com', 'spam-removed')
(1612, 'www.facebook.com', 'spam-removed')
(1614, 'www.facebook.com', 'spam-removed')
(1616, 'www.facebook.com', 'spam-removed')
(1617, 'www.facebook.com', 'spam-removed')
(1652, 'm.facebook.com', '')
(1687, 'www.fb.com', 'spam-removed')
(1688, 'm.fb.com', '')
(1689, 'www.fb.com', 'spam-removed')
(1690, 'm.fb.com', 'spam-removed')
(1691, 'www.fb.com', 'spam-removed')
(1692, 'www.fb.com', 'spam-removed')
(174, 'www.jetblue.com', 'jetblue.com')
(1807, 'amazon.com', 'https:')
(1850, 'mozillafestival.org', '2015.mozillafestival.org')
(1917, 'www.flipkart.com', 'www')
(1995, '8888.186tcye.pw', '')
(20, 'crosswalkdp.com', 'www.crosswalkdp.com')
(2001, 'glasses.com', 'www.glasses.com')
(2007, 'bioskop21.id', '')
(2008, 'bioskop21.id', '')
(2017, 'appinstallsmobi.com', '')
(2019, 'webcompat.com', 'www.6666hh.com')
(207, 'nfl.com', 'www.nfl.com')
(2107, 'allindiaradio.govt.in', 'allindiaradio.gov.in')
(2181, 'm.facebook.com', '')
(2232, 'm.facebook.com', '')
(2240, 'yuku.com', 'www.yuku.com')
(2243, 'www.sz-runxin.com', '')
(2314, 'saa.qualtrics.com', '')
(2396, 'hotmoza.com', '')
(2433, 'www.marcoborla.it', '')
(2476, 'barbershop.org', 'ebiz.barbershop.org')
(2498, 'video.js', 'github.com')
(2499, 'discovery.com', 'www.discovery.com')
(2502, '1g22.com', '')
(2503, '1g22.com', '')
(2505, 'jornada.una.mx', 'www.jornada.unam.mx')
(2740, 'oneviewcalendar.com', 'www.oneviewcalendar.com')
(28, 'webcompat.com', 'github.com')
(2822, 'dragon8.troyhero.com', '')
(2823, '546r.com', '')
(2884, 'www.tangerine.ca', 'secure.tangerine.ca')
(2891, 'www.luludai.cc', '')
(3, 'volcanicpixels.com', 'www.volcanicpixels.com')
(3066, 'chromestatus.com', 'www.chromestatus.com')
(3146, 'mobile22.gameassists.co.uk', '`http')
(3464, 'codepen.io', '')
(3623, 'largepenissociety.tumblr.com', 'large*society.tumblr.com')
(372, 'cbc.ca', '')
(3835, 'outlook.live.com', '')
(385, 'pch.sweeps.com', 'pch sweeps.com')
(399, 'www.hwbank.it,', 'www.hwbank.it, www.netxhs.it')
(4119, 'www.', 'www. webcompat.com')
(45, 'http.req.url.http_url_safe', 'www.ibm.com')
(4729, 'm.weibo.cn', 'm.weibo.cn -  swipe gesture issue')
(490, 'm.spiegel.de', 'm.spiegel.de   or   spiegel.de')
(4979, 'www.facebook.com', '')
(4987, 'answers.yahoo.com', '')
(5007, 'www.reddit.com', '')
(5008, 'www.reddit.com', '')
(5009, 'www.reddit.com', '')
(5011, 'www.reddit.com', '')
(5012, 'www.twitter.com', '')
(5070, 'www.linkedin.com', '')
(5073, 'www.linkedin.com', '')
(51, 'expedia.co.jp', 'www.expedia.co.jp')
(521, 'okcupid.com', 'www.okcupid.com')
(53, 'nascarwagers.com', 'www.nascarwagers.com')
(54, 'xvideos.com', 'www.xvideos.com')
(5488, 'www.xvideos.com', '')
(5489, 'www.indeed.com', 'indeed.com')
(5509, 'www.spotify.com', 'open.spotify.com')
(5566, 'www.bestbuy.com', 'www.bestbuy-jobs.com')
(5568, 'www.bestbuy.com', 'www.bestbuy-jobs.com')
(5573, 'www.deals.bestbuy.com', 'deals.bestbuy.com')
(5589, 'www.baidu.com.com', 'goo.gl')
(5591, 'www.baidu.com', 'music.baidu.com')
(5592, 'www.baidu.com', 'voice.baidu.com')
(5593, 'www.baidu.com', 'voice.baidu.com')
(5602, 'www.baidu.com', 'goo.gl')
(5604, 'www.disney.com', 'm.disneystore.com')
(5605, 'www.disney.com', 'm.disneystore.com')
(5654, 'www.homedepot.com', 'm.homedepot.com')
(5656, 'www.homedepot.com', 'm.homedepot.com')
(57, 'momondo.com', 'm.momondo.com')
(59, 'independent.co.uk', 'www.independent.co.uk')
(5906, 'www.rumble.com', 'rumble.com')
(5910, 'www.rumble.com', 'rumble.com')
(5936, 'm.privacy2browsing.com', '[removed]')
(5949, 'www.rumble.com', 'rumble.com')
(5953, 'www.rumble.com', 'rumble.com')
(5957, 'www.rumble.com', 'rumble.com')
(6016, 'www.gomovies.to', 'gomovies.to')
(6021, 'www.gomovies.to', 'gomovies.to')
(6023, 'www.gomovies.to', 'gomovies.to')
(6049, 'www.gomovies.to', 'gomovies.to')
(6141, 'youtube.com', 'www.youtube.com')
(617, 'grammarly.com', '')
(6183, 'www.citi.com', 'online.citi.com')
(6184, 'www.citi.com', 'www.privatebank.citibank.com')
(6186, 'www.citi.com', 'www.privatebank.citibank.com')
(6195, 'www.businessinsider.com', 'intelligence.businessinsider.com')
(6216, 'www.wikipedia.org', 'goo.gl')
(6217, 'www.wikipedia.org', 'en.m.wikipedia.org')
(6218, 'www.wikipedia.org', 'en.m.wikipedia.org')
(6219, 'www.wikipedia.org', 'en.m.wikipedia.org')
(6248, 'www.wikipedia.org', 'en.m.wikivoyage.org')
(6250, 'www.wikipedia.org', 'm.mediawiki.org')
(6254, 'www.yahoo.com', 'fr.yahoo.com')
(6255, 'www.yahoo.com', 'research.yahoo.com')
(6256, 'www.yahoo.com', 'research.yahoo.com')
(6397, 'www.yahoo.com', 'fr.yahoo.com')
(6398, 'www.yahoo.com', 'login.yahoo.com')
(6400, 'www.yahoo.com', 'fr.sports.yahoo.com')
(6402, 'www.yahoo.com', 'fr.finance.yahoo.com')
(6403, 'www.yahoo.com', 'fr.finance.yahoo.com')
(6411, 'www.lemonde.fr', 'abo.lemonde.fr')
(6412, 'www.lemonde.fr', 'secure.lemonde.fr')
(6435, 'www.lemonde.fr', 'moncompte.lemonde.fr')
(644, 'inbox.google.com', '')
(6447, 'www.ebay.fr', 'm.ebay.fr')
(6463, 'www.ebay.fr', 'csr.ebay.fr')
(6471, 'www.ebay.fr', 'csr.ebay.fr')
(6475, 'www.ebay.fr', 'm.ebay.fr')
(6477, 'www.ebay.fr', 'm.ebay.fr')
(6499, 'www.allocine.fr', 'secure.allocine.fr')
(6567, 'www.sfr.fr', 'assistance.sfr.fr')
(6576, 'www.lequipe.fr', 'm.lequipe.fr')
(6577, 'www.ebay.fr', 'csr.ebay.fr')
(6584, 'youtube.com', 'www.youtube.com')
(6593, 'www.lequipe.fr', 'm.lequipe.fr')
(6595, 'www.lequipe.fr', 'm.lequipe.fr')
(6628, 'www.aliexpress.com', 'm.fr.aliexpress.com')
(6629, 'www.aliexpress.com', 'm.fr.aliexpress.com')
(6633, 'www.aliexpress.com', 'm.fr.aliexpress.com')
(6656, 'www.aliexpress.com', 'm.fr.aliexpress.com')
(6658, 'www.aliexpress.com', 'm.fr.aliexpress.com')
(666, 'mint.com', 'javascript')
(6663, 'www.aliexpress.com', 'm.fr.aliexpress.com')
(6668, 'www.tumblr.com', 'goo.gl')
(6725, 'www.stackoverflow.com', 'stackoverflow.com')
(6728, 'disqus.com', 'stackoverflow.blog')
(6786, 'www.bfmtv.com', 'rmc.bfmtv.com')
(6789, 'www.leparisien.fr', 'm.leparisien.fr')
(6790, 'www.leparisien.fr', 'm.leparisien.fr')
(6791, 'www.leparisien.fr', 'connect.leparisien.fr')
(6830, 'www.fnac.com', 'secure.fnac.com')
(6893, 'www.societegenerale.fr', 'm.particuliers.societegenerale.fr')
(6896, 'www.societegenerale.fr', '3qv7.la1-c1-frf.salesforceliveagent.com')
(6912, 'www.bouyguestelecom.fr', 'www.mon-compte.bouyguestelecom.fr')
(6914, 'www.bouyguestelecom.fr', 'www.assistance.bouyguestelecom.fr')
(6916, 'www.bouyguestelecom.fr', 'forum.bouyguestelecom.fr')
(6945, 'www.laposte.net', 'compte.laposte.net')
(6955, 'www.ok.ru', 'm.ok.ru')
(696, 'ign.com', 'in.ign.com')
(6973, 'www.ok.ru', 'm.ok.ru')
(6979, 'www.ok.ru', 'm.ok.ru')
(6999, 'www.ouest-france.fr', 'www.ouestfrance-immo.com')
(7, 'youtube.com', 'm.youtube.com')
(7002, 'www.ouest-france.fr', 'www.ouestfrance-immo.com')
(7006, 'www.ouest-france.fr', 'www.ouestfrance-immo.com')
(7012, 'www.deezer.com', 'support.deezer.com')
(7041, 'youtube.com', 'https:')
(707, 'myatt.com', 'myatt.com or http')
(7071, 'www.leroymerlin.fr', 'communaute.leroymerlin.fr')
(7099, 'www.libertyland.co', 'libertyland.co')
(7118, 'www.libertyland.co', 'libertyland.co')
(7119, 'www.mabanque.bnpparibas', 'mabanque.bnpparibas')
(7126, 'www.mabanque.bnpparibas', 'mabanque.bnpparibas')
(7127, 'www.mabanque.bnpparibas', 'mabanque.bnpparibas')
(7186, 'www.liberation,fr', 'www.liberation.fr')
(7188, 'www.vimeo.com', 'vimeo.com')
(7289, 'www.google.com', 'google.com')
(7290, 'www.google.com', 'google.com')
(7292, 'www.google.com', 'google.com')
(7293, 'www.google.com', 'google.com')
(7296, 'www.google.com', 'google.com')
(7298, 'www.google.com', 'google.com')
(7299, 'www.google.com', 'google.com')
(7304, 'www.google.com', 'google.com')
(7305, 'www.google.com', 'google.com')
(7309, 'www.google.com', 'google.com')
(7311, 'www.google.com', 'google.com')
(7319, 'www.google.com', 'google.com')
(7323, 'www.google.com', 'google.com')
(7356, 'www.google.com', 'google.com')
(74, 'www.fresno.courts.ca.gov', '')
(7409, 'www.hotstart.com', 'www.hotstar.com')
(7422, 'www.ntd.tv', 'mb.ntd.tv')
(7424, 'www.ntd.tv', 'mb.ntd.tv')
(7441, 'www.torrentz2.eu', 'torrentz2.eu')
(7443, 'www.ndtv.com', 'm.ndtv.com')
(7451, 'www.ndtv.com', 'm.ndtv.com')
(7468, 'www.ndtv.com', 'auto.ndtv.com')
(7470, 'www.rediff.com', 'm.rediff.com')
(7479, 'www.rediff.com', 'labs.rediff.com')
(7480, 'www.rediff.com', 'labs.rediff.com')
(7481, 'www.rediff.com', 'register.rediff.com')
(7482, 'www.rediff.com', 'ishare.rediff.com')
(7514, 'www.rediff.com', 'zarabol.rediff.com')
(7516, 'www.rediff.com', 'm.rediff.com')
(7522, 'www.rediff.com', 'mypage.rediff.com')
(7585, 'www.moneycontrol.com', 'm.moneycontrol.com')
(7587, 'www.moneycontrol.com', 'm.moneycontrol.com')
(7593, 'www.moneycontrol.com', 'm.moneycontrol.com')
(7594, 'www.snapdeal.com', 'm.snapdeal.com')
(7615, 'www.msn.com-', 'www.msn.com')
(7639, 'www.makemytrip.com', 'holidayz.makemytrip.com')
(7647, 'www.justdial.com', 't.justdial.com')
(7648, 'www.justdial.com', 't.justdial.com')
(7653, 'www.justdial.com', 't.justdial.com')
(7749, 'www.justdial.com', 't.justdial.com')
(7752, 'www.softonic.com', 'features.en.softonic.com')
(7757, 'www.indianexpress.com', 'indianexpress.com')
(7758, 'www.indianexpress.com', 'indianexpress.com')
(7795, 'www.indianexpress.com', 'indianexpress.com')
(7796, 'www.indianexpress.com', 'indianexpress.com')
(78, 'comptoir-hardware.com', 'www.comptoir-hardware.com')
(7803, 'www.xhamster.com', 'm.xhamster.com')
(7804, 'www.shopclues.com', 'm.shopclues.com')
(7858, 'www.oneindia.com', 'recharge.oneindia.com')
(7902, 'www.filehippo.com', 'filehippo.com')
(7904, 'www.indiamart.com', 'm.indiamart.com')
(7909, 'www.indiamart.com', 'm.indiamart.com')
(81, 'citroen.ru', 'www.citroen.ru')
(86, 'outlook.com', 'www.outlook.com')
(89, 'ovh.com', 'www.ovh.com')
(900, 'tastebuds.fm', 'tastebuds.fm and naukri.com')
(918, 'deceeeu.ro', 'deceeu.ro')
(955, 'tastebuds.fm', 'tastebuds.fm and naukri.com , webcompat')
(964, 'www.weibo.com.com', 'www.weibo.com')
karlcow commented 7 years ago

Recording here so it's not lost. @denschub yesterday was suggesting that we provide only for 2nd level domain name. To maximize the outreach for web developers of one company.

This could be done, aka instead of:

/feed/www.example.org
/feed/lab.example.org

we provide only

/feed/example.org

I personally prefer the more granular version for different reasons, but I think we could do both. Some of my reasons:

karlcow commented 6 years ago

Let me kill this with fire. :) And let's revive it once/if one day we have a DB with issues.