tsutsu3 / linkify-it-py

Links recognition library with full unicode support
Other
15 stars 8 forks source link
autolink autolinker linkifier linkify

linkify-it-py

CI pypi Anaconda-Server Badge Documentation Status codecov Maintainability

This is Python port of linkify-it.

Links recognition library with FULL unicode support. Focused on high quality link patterns detection in plain text.

Demo

Javascript Demo

Why it's awesome:

Install

pip install linkify-it-py

or

conda install -c conda-forge linkify-it-py

Usage examples

Example 1. Simple use

from linkify_it import LinkifyIt

linkify = LinkifyIt()

print(linkify.test("Site github.com!"))
# => True

print(linkify.match("Site github.com!"))
# => [linkify_it.main.Match({
#         'schema': '',
#         'index': 5,
#         'last_index': 15,
#         'raw': 'github.com',
#         'text': 'github.com',
#         'url': 'http://github.com'
#     }]

Example 2. With options

from linkify_it import LinkifyIt
from linkify_it.tlds import TLDS

# Reload full tlds list & add unofficial `.onion` domain.
linkify = (
    LinkifyIt()
    .tlds(TLDS)               # Reload with full tlds list
    .tlds("onion", True)      # Add unofficial `.onion` domain
    .add("git:", "http:")     # Add `git:` protocol as "alias"
    .add("ftp:", None)        # Disable `ftp:` protocol
    .set({"fuzzy_ip": True})  # Enable IPs in fuzzy links (without schema)
)
print(linkify.test("Site tamanegi.onion!"))
# => True

print(linkify.match("Site tamanegi.onion!"))
# => [linkify_it.main.Match({
#         'schema': '',
#         'index': 5,
#         'last_index': 19,
#         'raw': 'tamanegi.onion',
#         'text': 'tamanegi.onion',
#         'url': 'http://tamanegi.onion'
#     }]

Example 3. Add twitter mentions handler

from linkify_it import LinkifyIt

linkify = LinkifyIt()

def validate(obj, text, pos):
    tail = text[pos:]

    if not obj.re.get("twitter"):
        obj.re["twitter"] = re.compile(
            "^([a-zA-Z0-9_]){1,15}(?!_)(?=$|" + obj.re["src_ZPCc"] + ")"
        )
    if obj.re["twitter"].search(tail):
        if pos > 2 and tail[pos - 2] == "@":
            return False
        return len(obj.re["twitter"].search(tail).group())
    return 0

def normalize(obj, match):
    match.url = "https://twitter.com/" + re.sub(r"^@", "", match.url)

linkify.add("@", {"validate": validate, "normalize": normalize})

API

API documentation

LinkifyIt(schemas, options)

Creates new linkifier instance with optional additional schemas.

By default understands:

schemas is an dict, where each key/value describes protocol/rule:

options:

.test(text)

Searches linkifiable pattern and returns True on success or False on fail.

.pretest(text)

Quick check if link MAY BE can exist. Can be used to optimize more expensive .test() calls. Return False if link can not be found, True - if .test() call needed to know exactly.

.test_schema_at(text, name, position)

Similar to .test() but checks only specific protocol tail exactly at given position. Returns length of found pattern (0 on fail).

.match(text)

Returns list of found link matches or null if nothing found.

Each match has:

.matchAtStart(text)

Checks if a match exists at the start of the string. Returns Match (see docs for match(text)) or null if no URL is at the start. Doesn't work with fuzzy links.

.tlds(list_tlds, keep_old=False)

Load (or merge) new tlds list. Those are needed for fuzzy links (without schema) to avoid false positives. By default:

If that's not enough, you can reload defaults with more detailed zones list.

.add(key, value)

Add a new schema to the schemas object. As described in the constructor definition, key is a link prefix (skype:, for example), and value is a str to alias to another schema, or an dict with validate and optionally normalize definitions. To disable an existing rule, use .add(key, None).

.set(options)

Override default options. Missed properties will not be changed.

License

MIT