vaughany / yourls-bulk-import-and-shorten

A plugin to bulk import long URLs and shorten them automatically.
25 stars 13 forks source link

Bulk Import and Shorten - a YOURLS plugin

Plugin for YOURLS 1.7.x.

Description

I looked at an import/export plugin for YOURLS but it only imported what it exported (for backup and restore purposes, I assume). I needed a way of getting long URLs into YOURLS in bulk, and not one at a time or via the Bookmarklets (as the links don't exist just yet). This plugin solves that problem. If you can create a CSV file, you can use this plugin.

Note: CSV (comma-separated value) files are plain text files with columns separated by commas and fields deliniated by double quotes, and rows deliniated by carriage returns. MS Excel, OpenOffice/LibreOffice and Google Spreadsheets can all export data in CSV format.

Installation

  1. In /user/plugins, create a new folder named bulk-import-and-shorten.
  2. Drop these files in that directory.
  3. Go to the Manage Plugins page (e.g. http://sho.rt/admin/plugins.php) and activate the plugin.
  4. Under the Manage Plugins link should be a new link called Bulk Import and Shorten.
  5. Have fun!

Configuration

This plugin has no user-configurable options. If you know what you're doing you can add and amend this plugin, but some changes will be easier than others. If you make substantial improvements, let me know.

Use

This plugin expects you to upload a CSV file with at least one column and optional second and third columns. The first column should contain the long URL of the 'target', e.g. http://bbc.co.uk. The optional second column can contain a keyword you would like to associate with this URL, if you don't want YOURLS to generate one for you. The third column contains an optional title. If one is not supplied, YOURLS generates one for you.

Note: In this repository is a file called test.csv, which you can use as an example.

No keywords

If you don't specify a keyword in the CSV file, YOURLS will make a short URL for you. As default, this will be an integer (starting at '1' when YOURLS was first installed) and will increase by 1 with each additional URL, but you have more options:

With keywords

If you supply keywords in the second column of the CSV file, then when imported, YOURLS will handle them according to the configuration settings and activated plugins. YOURLS always strips out (and never uses) any non-alpha-numeric characters (with the exception being hyphens, but read on):

http://bbc.co.uk/,bbcnews --> Will import as-is.

http://bbc.co.uk/,BBC --> UPPERCASE letters are stripped out (with '36' setting) and will import as if it had no keyword supplied. With the '62' setting, will import as-is.

http://bbc.co.uk/,BBCnews --> Will become news (with '36' setting) or will import as-is (with '62' setting).

http://bbc.co.uk/,bbcnews --> Will import as-is.

http://bbc.co.uk/,BBC --> Will become 'bbc'. (If not using the plugin, will import as-is.)

http://bbc.co.uk/,BBCnews --> Will become 'bbcnews' and therefore a duplicate, so will not be imported. (If not using the plugin, will import as-is.)

http://bbc.co.uk/,bbc --> Will import fine.

http://bbc.co.uk/,news --> Will be rejected if YOURLS_UNIQUE_URLS is 'true', or allowed if 'false'.

http://bbc.co.uk/,bbc-news --> Without the plugin activated, will become 'bbcnews'. With the plugin activated, will import as-is.

I have talked through some common plugins and the outcomes of various situations, but please check carefully the short URLs of bulk-imported long URLs to ensure everything is as you expect. Due to the way plugins can hook into YOURLS I cannot know what plugins are installed and how they may affect how a short URL is created.

Titles

Normally, YOURLS will try to get a title for you by downloading the web page from the supplied URL and checking the raw HTML for a title. This is possibly the reason for slow bulk importing, when extremely large CSV files are uploaded. This plugin prevents YOURLS from doing it's job and instead creates a title from the supplied URL, or uses the one in the third column of the CSV, if present and non-empty.

Troubleshooting

One user experienced timeout issues processing a CSV file containing ~60,000 rows. I've not investigated this issue thoroughly, but the 0.2 version contains a line that creates a title from the supplied URL instead of letting YOURLS CURLing the URL and extracting a title from the returned HTML. I believe this to be faster, but I have only done a little testing.

Some ideas, if processing a large CSV file:

Note: Setting either of these to 300 / 600 / 900 seconds (5 / 10 / 15 minutes) is not unreasonable.

I've not investigated memory issues at this point, however I'd like to hear back if you have any experience of these (or any issues, for that matter).

License

Uses YOURLS' license, aka "Do whatever the hell you want with it". I borrowed some code from others (including GautamGupta who in turn borrowed from John Godley) whose code had no licence so I can't claim this whole plugin as my own work, but the lion's share of it is.

Bugs and Features

I'm always keen to add new features, improve performance and squash bugs, so if you do any of these things, let me know. Fork the project, make your changes and send me a pull request. If you find a bug but aren't a coder yourself, open an issue and give me as much detail as possible about the problem:

To-Do

History

Finally...

I hope you find this plugin useful.