yachielab / QUEEN

QUEEN: a framework to generate quinable and efficiently editable nucleotide sequence resources
MIT License
37 stars 8 forks source link
bioinformatics biopython jupyter-notebooks plasmids

QUEEN Installation and User Manual

QUEEN (a framework to generate quinable and efficiently editable nucleotide sequence resources) is a Python programming module designed to describe, share credit DNA building processes and resources. DNA parts information can be imported from external annotated DNA files (GenBank and FASTA format). Output file (GenBank format) encodes the complete information of the constructed DNA and its annotations and enables the production of a quine code that self-reproduces the output file itself. In QUEEN, all of the manipulations required in DNA construction are covered by four simple operational functions, "cutdna", "modifyends", "flipdna", and "joindna" that can collectively represent any of the standard molecular DNA cloning processes, two search functions, "searchsequence" and "searchfeature", and two super functions, "editsequence" and "editfeature". A new DNA can be designed by programming a Python script or using Jupyter Notebook, an interactive Python programming interpreter. The designed DNA product can be output in the GenBank file format that involves the history of its building process. The "quinable" feature of a QUEEN-generated GenBank file certifies that the annotated DNA material information and its production process are fully transparent, reproducible, inheritable, and modifiable by the community.

If you've found QUEEN is useful for your research, please consider citing our paper published in Nature Communications.

Mori, H., Yachie, N. A framework to efficiently describe and share reproducible DNA materials and construction protocols. Nat Commun 13, 2894 (2022). https://doi.org/10.1038/s41467-022-30588-x

News

qexperiment module was added

qexperiment module enable users to easily describe and simulate experimental molecular cloing process with newly implemented methods based on actual experimental methods. For now, the following methods can be available.

For deatails, please see qexperiment_usage.md, qexperiment_demo.ipynb and Google colab notebook.

Change log

Please see changelog.md.

Software dependency

Python 3.7.0 or later

Installation

  1. Install QUEEN using the following command.
    For the official release (v1.1.0) on the Python Package Index

    pip install python-queen 

    For the developmental version on GitHub

    pip install git+https://github.com/yachielab/QUEEN.git@(branch name) 
  2. Install Graphviz (optional; required for visualizing flowcharts of DNA building processes using the visualizeflow() function described below). Graphviz package is available at the following link.

Usage

QUEEN provides the QUEEN class to define a double-stranded (ds)DNA object with sequence annotations. The QUEEN class and its operational functions are described below. Jupyter Notebook files for all of the example codes are provided in ./demo/tutorial of QUEEN (https://github.com/yachielab/QUEEN) and made executable in Google Colaboratory.

Command line interface
A part of QUEEN functions can also be used from the command line interface (CLI) instead of describing python codes.
For details, please see CLI_usage.md

Simulators for general molecular cloning methods
simple molecular cloning simulators for both homology-based and digestion/ligation-based assembly are provided on Google colab. By using these simulators, you can exeperience the benefits of QUEEN without describing python codes.

QUEEN class

The QUEEN class can define a dsDNA object with sequence annotations. It can be created by specifying a DNA sequence or importing a sequence data file in GenBank or FASTA file format (single sequence entry). When a GenBank format file is imported, its NCBI accession number, Addgene plasmid ID, or Benchling share link can be provided instead of downloading the file to your local environment.

Example code 1: Create a QUEEN class object (blunt-ends)

A QUEEN_object (blunt-end) is created by providing its top-stranded sequence (5’-to-3’). By default, the DNA topology will be linear.
(Expected runtime: less than 1 sec.)

Source code

from QUEEN.queen import *
dna = QUEEN(seq="CCGGTATGCGTCGA") 

Example code 2: Create a QUEEN class object (sticky-end)

The left and right values separated by "/" show the top and bottom strand sequences of the generating QUEEN_object, respectively. The top strand sequence is provided in the 5’-to-3’ direction from left to right, whereas the bottom strand sequence is provided in the 3′-to-5′ direction from left to right. Single-stranded regions can be provided by "-" for the corresponding nucleotide positions on the opposite strands. A:T and G:C base-pairing rule is required between the two strings except for the single-stranded positions.
(Expected runtime: less than 1 sec.)

Source code

from QUEEN.queen import *
dna = QUEEN(seq="CCGGTATGCG----/----ATACGCAGCT") 

Example code 3.1: Create a circular QUEEN class object

The sequence topology of generating QUEEN_object can be specified by "linear" or"circular".
(Expected runtime: less than 1 sec.)

Source code

from QUEEN.queen import *
dna = QUEEN(seq="CCGGTATGCGTCGA", topology="circular") 

Example code 3.2: Create a ssDNA QUEEN class object *(available from v1.1)

The single strand QUEEN_object can be generated by specifying ssdna=True.
(Expected runtime: less than 1 sec.)

Source code

from QUEEN.queen import *
dna = QUEEN(seq="CCGGTATGCGTCGA", ssdna=True) 

Example code 4.1: Create a QUEEN class object from a GenBank file in a local directory

GenBank file can be loaded by specifying its local file path.
(Expected runtime: less than 1 sec.)

Source code

from QUEEN.queen import *
pUC19 = QUEEN(record="./input/pUC19.gbk")

Example code 4.2: Create a QUEEN class object using a NCBI accession number

QUEEN_object can be generated from a NCBI accession number with dbtype="ncbi".
(Expected runtime: less than 1 sec.)

Source code

from QUEEN.queen import *
#"M77789.2" is NCBI accession number for pUC19 plasmid
pUC19 = QUEEN(record="M77789.2", dbtype="ncbi") 

Example code 4.3: Create a QUEEN class object using an Addgene plasmid ID

QUEEN_object can be generated from an Addgene plasmid ID with dbtype="addgene".
(Expected runtime: less than 1 sec.)

Source code

from QUEEN.queen import *
#"50005" is Addgene plasmid ID for pUC19 plasmid
pUC19 = QUEEN(record="50005", dbtype="addgene")

Example code 4.4: Create a QUEEN class object from a Benchling share link

QUEEN_object can be generated from a Benchling shared link with dbtype="benchling".
(Expected runtime: less than 1 sec.)

Source code

from QUEEN.queen import *
plasmid = QUEEN(record="https://benchling.com/s/seq-U4pePb09KHutQzjyOPQV", dbtype="benchling")

pX330 plasmid encoding a Cas9 gene and a gRNA expression unit is provided in the above example. The QUEEN_object generated here is used in the following example codes in this document.

Properties of QUEEN class objects

Output functions

QUEEN_objects hold a simple set of functions to output its information.

Example code 5: Print a dsDNA object

(Expected runtime: less than 1 sec.)

Source code

from queen import *
fragment = QUEEN(seq="CCGGTATGCG----/----ATACGCAGCT") 
fragment.printsequence(display=True)

Output

5′ CCGGTATGCG---- 3′
3′ ----ATACGCAGCT 5′

Search Function

QUEEN_objects hold .searchsequene() and .searchfeature() functions that enables users to search for query sequences and values in DNAfeature_objects.

Example code 8: Search for a DNA sequence motif with fuzzy matching

Search for "AAAAAAAA" sequence, permitting a single nucleotide mismatch.
(Expected runtime: less than 1 sec.)

Source code (continued from the previous code)

match_list = plasmid.searchsequence(query="(?:AAAAAAAA){s<=1}")
plasmid.printfeature(match_list, seq=True) 

Output

feature_id  feature_type  qualifiers:label  start  end   strand  sequence  
null        misc_feature  null              5484   5492  +       AAAAAAGA  
null        misc_feature  null              6369   6377  +       AACAAAAA  
null        misc_feature  null              7872   7880  +       AAACAAAA  
null        misc_feature  null              346    354   -       AAAACAAA  
null        misc_feature  null              799    807   -       AAAAAATA  
null        misc_feature  null              1201   1209  -       GAAAAAAA  
null        misc_feature  null              6716   6724  -       AAAAATAA  
null        misc_feature  null              7844   7852  -       AGAAAAAA 

Example code 9: Search for a DNA sequence with the IUPAC nucleotide code

(Expected runtime: less than 1 sec.)

Source code (continued from the previous code)

match_list = plasmid.searchsequence(query="SWSWSWDSDSBHBRHH")
plasmid.printfeature(match_list, seq=True)

Output

feature_id  feature_type  qualifiers:label  start  end   strand  sequence          
null        misc_feature  null              4098   4114  +       GAGACAGCTGGTGGAA  
null        misc_feature  null              3550   3566  -       CTGTCTGCAGGATGCC  
null        misc_feature  null              5239   5255  -       CTCTGATGGGCTTATC  
null        misc_feature  null              6415   6431  -       GAGAGTGCACCATAAA  
null        misc_feature  null              8357   8373  -       GTCAGAGGTGGCGAAA  

Example code 10: Search for sequence features having specific attribute values

Search for DNAfeature_objects with a feature type "primer_bind", and then further screen ones holding a specific string in "qualifier:label".
(Expected runtime: less than 1 sec.)

Source code (continued from the previous code)

feature_list = plasmid.searchfeature(key_attribute="feature_type", query="primer_bind")
plasmid.printfeature(feature_list)
sub_feature_list = plasmid.searchfeature(key_attribute="qualifier:label", query=".+-R$", source=feature_list)
plasmid.printfeature(sub_feature_list)

Output

feature_id  feature_type  qualifiers:label  start  end   strand  
200         primer_bind   hU6-F            0      21    +       
300         primer_bind   LKO.1 5'         171    191   +       
1200        primer_bind   BGH-rev          5524   5542  -       
1700        primer_bind   F1ori-R          6048   6068  -       
1800        primer_bind   F1ori-F          6258   6280  +       
1900        primer_bind   pRS-marker       6433   6453  -       
2000        primer_bind   pGEX 3'          6552   6575  +       
2100        primer_bind   pBRforEco        6612   6631  -       
2400        primer_bind   Amp-R            7021   7041  -       
2600        primer_bind   pBR322ori-F      8323   8343  +       

feature_id  feature_type  qualifiers:label  start  end   strand  
1700        primer_bind   F1ori-R          6048   6068  -       
2400        primer_bind   Amp-R            7021   7041  -   

Operational functions

QUEEN objects can be manipulated by four simple operational functions, cutdna(), modifyends(), flipdna(), and joindna(), that can collectively represent any of the standard molecular DNA cloning processes, and two super functions, editsequence() and editfeature().

Example code 11: Cut pX330 plasmid at multiple positions

Cut a circular plasmid px330 at the three different positions, resulting in the generation of three fragments. Then, cut one of the three fragments again.
(Expected runtime: less than 1 sec.)

Source code (continued from the previous code)

print(plasmid)
fragments = cutdna(plasmid ,1000, 2000, 4000)
print(fragments)
fragment3, fragment4 = cutdna(fragments[1], 500)
print(fragment3)
print(fragment4)

Output

<queen.QUEEN object; project='pX330', length='1000 bp', topology='linear' >, <queen.QUEEN object; project='pX330', length='2000 bp', topology='linear' >, <queen.QUEEN object; project='pX330', length='5484 bp', topology='linear' >]
<queen.QUEEN object; project='pX330', length='500 bp', topology='linear' >
<queen.QUEEN object; project='pX330', length='1500 bp', topology='linear' >

If an invalid cut pattern are provided, an error message will be returned.

Source code (continued from the previous code)

fragments = cutdna(plasmid, *["50/105", "100/55", "120/110"])

Error message

ValueError: Invalid cut pattern.

Example code 12: Digest pX330 plasmid by EcoRI

Digestion of pX330 plasmid with EcoRI can be simulated as follows.

  1. Search for EcoRI recognition sites in pX330 with its cut motif and obtain the DNAfeature_objects representing its cut position(s) and motif.

  2. Use the DNAfeature_objects to cut pX330 by cutdna().

    (Expected runtime: less than 1 sec.)

    Source code (continued from the previous code)

    sites     = plasmid.searchsequence("G^AATT_C")
    fragments = cutdna(plasmid, *sites)
    for fragment in fragments:
    print(fragment)
    fragment.printsequence(display=True, hide_middle=10)

    Output

    <queen.QUEEN object; project='pX330', length='8488 bp', topology='linear' >
    5' AATTCCTAGA...AGTAAG---- 3'
    3' ----GGATCT...TCATTCTTAA 5'

    QUEEN provides a library of restriction enzyme motifs (described in the New England Biolab's website).

    Source code (continued from the previous code)

    from QUEEN import cutsite #Import a restriction enzyme library
    sites = plasmid.searchsequence(cutsite.lib["EcoRI"])
    fragments = cutdna(plasmid, *sites)
    for fragment in fragments:
    print(fragment)
    fragment.printsequence(display=True, hide_middle=10) 

    Output

    <queen.QUEEN object; project='pX330', length='8488 bp', topology='linear' >
    5' AATTCCTAGA...AGTAAG---- 3'
    3' ----GGATCT...TCATTCTTAA 5'

Example code 13: Digest pX330 plasmid by Type-IIS restriction enzyme BbsI

(Expected runtime: less than 1 sec.)

Source code (continued from the previous code)

sites = plasmid.searchsequence("GAAGAC(2/6)")
fragments = cutdna(plasmid,*sites)
for fragment in fragments:
    print(fragment)
    fragment.printsequence(display=True, hide_middle=10)

Output

<queen.QUEEN object; project='pX330', length='8466 bp', topology='linear' >
5' GTTTTAGAGC...ACGAAA---- 3'
3' ----ATCTCG...TGCTTTGTGG 5'

<queen.QUEEN object; project='pX330', length='26 bp', sequence='CACCGGGTCTTCGAGAAGACCTGTTT', topology='linear'>
5' CACCGGGTCT...AGACCT---- 3'
3' ----CCCAGA...TCTGGACAAA 5'

Here, the BbsI recognition motif can also be represented by "(6/2)GTCTTC", "GAAGACNN^NNNN_" or "^NNNN_NNGTCTTC". The BbsI recognition motif is also available from the library of restriction enzyme motifs.

Source code (continued from the previous code)

from QUEEN import cutsite #Import a restriction enzyme library 
sites = plasmid.searchsequence(cutsite.lib["BbsI"])
fragments = cutdna(plasmid, *sites)
for fragment in fragments:
    print(fragment)
    fragment.printsequence(display=True, hide_middle=10) 

Output

<queen.QUEEN object; project='pX330', length='8466 bp', topology='linear' >
5' GTTTTAGAGC...ACGAAA---- 3'
3' ----ATCTCG...TGCTTTGTGG 5'

<queen.QUEEN object; project='pX330', length='26 bp', sequence='CACCGGGTCTTCGAGAAGACCTGTTT', topology='linear'>
5' CACCGGGTCT...AGACCT---- 3'
3' ----CCCAGA...TCTGGACAAA 5'

Example code 14: Crop a sequence fragment within a specified region

If the second fragment of "Example code 11" is for further manipulation, cropdna() is convenient.
(Expected runtime: less than 1 sec.)

Source code (continued from the previous code)

fragment = cropdna(plasmid ,2000, 4000)
print(fragment)

Output

<queen.QUEEN object; project='pX330', length='2000 bp', topology='linear' >

If a start position is larger than an end position, an error message will be returned.

Source code (continued from the previous code)

fragment = cropdna(fragment, 1500, 1000)

Error message

ValueError: 'end' position must be larger than 'start' position.

Example code 15: Trim nucleotides from a blunt-ended dsDNA to generate a sticky-ended dsDNA

Sticky ends can be generated by trimming nucleotides where their end structures are given by top and bottom strand strings with "*" and "-" separated by "/", respectively. The letters "-" indicate nucleotide letters to be trimmed, and the letters "*" indicate ones to remain.
(Expected runtime: less than 1 sec.)

Source code (continued from the previous code)

fragment = cropdna(plasmid, 100, 120)
fragment.printsequence(display=True)
fragment = modifyends(fragment, "-----/*****", "**/--")
fragment.printsequence(display=True)

Output

5' CTTAACGTTGGCTTGCCACG 3'
3' GAATTGCAACCGAACGGTGC 5'

5' ----ACGTTGGCTTGCCACG 3'
3' GAATTGCAACCGAACGGT-- 5'

The following codes achieve the same manipulation.
Source code (continued from the previous code)

fragment = cropdna(plasmid,'105/100', '120/118')
fragment.printsequence(display=True)

A regex-like format can also be used.
Source code (continued from the previous code)

fragment = modifyends(fragment, "-{5}/*{5}","*{2}/-{2}")
fragment.printsequence(display=True)

If a QUEEN object with circular topology is given, an error message will be returned.
Source code (continued from the previous code)

fragment = modifyends(plasmid, "-----/*****", "**/--")

Error message

ValueError: End sequence structures cannot be modified. The topology of the QUEEN_object is circular.

Example code 16: Add adapter sequences

modifyends() can also add adapter sequences to DNA ends.
(Expected runtime: less than 1 sec.)

Source code (continued from the previous code)

#Add blunt-ended dsDNA sequences to both ends
fragment = cropdna(plasmid, 100, 120)
fragment = modifyends(fragment,"TACATGC","TACGATG")
fragment.printsequence(display=True)
#Add sticky-ended dsDNA sequences to both ends
fragment = cropdna(plasmid, 100, 120)
fragment = modifyends(fragment,"---ATGC/ATGTACG","TACG---/ATGCTAC")
fragment.printsequence(display=True)

Output

5' TACATGCTACAAAATACGTGACGTAGATACGATG 3'
3' ATGTACGATGTTTTATGCACTGCATCTATGCTAC 5'

5' ---ATGCTACAAAATACGTGACGTAGATACG--- 3'
3' ATGTACGATGTTTTATGCACTGCATCTATGCTAC 5'

Example code 17: Clone an EGFP fragment into pX330

  1. Generate a QUEEN class object for an EGFP fragment,

  2. Create EcoRI sites to both ends of the EGFP fragment,

  3. Digest the EGFP fragment and pX330 by EcoRI, and

  4. Assemble the EGFP fragment and linearized pX330.  

    (Expected runtime: less than 1 sec.)

    Source code (continued from the previous code)

    EGFP     = QUEEN(record="input/EGFP.fasta")
    EGFP     = modifyends(EGFP, cutsite.lib["EcoRI"].seq, cutsite.lib["EcoRI"].seq)
    sites    = EGFP.searchsequence(cutsite.lib["EcoRI"]) 
    insert   = cutdna(EGFP, *sites)[1]
    insert.printsequence(display=True, hide_middle=10)
    sites    = plasmid.searchsequence(cutsite.lib["EcoRI"])
    backbone = cutdna(plasmid, *sites)[0]
    backbone.printsequence(display=True, hide_middle=10)
    pEGFP    = joindna(backbone, insert, topology="circular") 
    print(plasmid)
    print(EGFP)
    print(pEGFP) 

    Output

    5′ AATTCGGCAG...ACAAGG---- 3′
    3′ ----GCCGTC...TGTTCCTTAA 5′
    
    5′ AATTCCTAGA...AGTAAG---- 3′
    3′ ----GGATCT...TCATTCTTAA 5′
    
    <queen.QUEEN object; project='pX330', length='8484 bp', topology='circular'>
    <queen.QUEEN object; project='EGFP', length='789 bp', topology='linear'>
    <queen.QUEEN object; project='pX330', length='9267 bp', topology='circular'>

    If connecting DNA end structures of the input QUEEN_object are not compatible, an error message will be returned.
    Source code (continued from the previous code

    EGFP     = QUEEN(record="input/EGFP.fasta")
    EGFP     = modifyends(EGFP, cutsite.lib["BamHI"].seq, cutsite.lib["BamHI"].seq)
    sites    = EGFP.searchsequence(cutsite.lib["BamHI"]) 
    insert   = cutdna(EGFP, *sites)[1]
    insert.printsequence(display=True, hide_middle=10)/
    pEGFP    = joindna(backbone, insert, topology="circular") 

    Error message

    ValueError: The QUEEN_objects cannot be joined due to the end structure incompatibility.

    Example code 18: Create a gRNA expression plasmid

    pX330 serves as a standard gRNA expression backbone plasmid. A gRNA spacer can simply be cloned into a BbsI-digested destination site of pX330 as follows:

  5. Generate QUEEN object for a sticky-ended gRNA spacer dsDNA,

  6. Digest pX330 by BbsI, and

  7. Assemble the spacer with the BbsI-digested pX330.  

    (Expected runtime: less than 1 sec.)

    Source code (continued from the previous code)

    gRNA_top    = QUEEN(seq="CACCGACCATTGTTCAATATCGTCC", ssdna=True)
    gRNA_bottom = QUEEN(seq="AAACGGACGATATTGAACAATGGTC", ssdna=True)
    gRNA        = joindna(gRNA_top, gRNA_bottom, 
                    supfeature={"feature_id":"gRNA-1", "feature_type":"gRNA", "qualifier:label":"gRNA"})
    gRNA.printsequence(display=True)
    
    sites       = plasmid.searchsequence(cutsite.lib["BbsI"])
    fragments   = cutdna(plasmid, *sites)
    backbone    = fragments[0] if len(fragments[0].seq) > len(fragments[1].seq) else fragment[1]
    pgRNA       = joindna(gRNA, backbone, topology="circular", product="pgRNA")
    
    pgRNA.printfeature()
    print(backbone)
    print(insert)
    print(pgRNA) 

    Output

    5' CACCGACCATTGTTCAATATCGTCC---- 3'
    3' ----CTGGTAACAAGTTATAGCAGGCAAA 5'
    
    feature_id  feature_type   qualifier:label     start  end   strand  
    0           primer_bind    hU6-F               0      21    +       
    100         promoter       U6 promoter         0      241   +       
    200         source         source              0      249   +       
    300         primer_bind    LKO.1 5'            171    191   +       
    gRNA-1      gRNA           gRNA                245    274   +       
    500         misc_RNA       gRNA scaffold       270    346   +       
    600         source         source              270    8487  +       
    700         enhancer       CMV enhancer        442    728   +       
    800         intron         hybrid intron       986    1214  +       
    900         regulatory     Kozak sequence      1225   1235  +       
    1000        CDS            3xFLAG              1234   1300  +       
    1100        CDS            SV40 NLS            1306   1327  +       
    1200        CDS            Cas9                1351   5452  +       
    1300        CDS            nucleoplasmin NLS   5452   5500  +       
    1400        primer_bind    BGH-rev             5527   5545  -       
    1500        polyA_signal   bGH poly(A) signal  5533   5741  +       
    1600        repeat_region  AAV2 ITR            5749   5879  +       
    1700        repeat_region  AAV2 ITR            5749   5890  +       
    1800        rep_origin     f1 ori              5964   6420  +       
    1900        primer_bind    F1ori-R             6051   6071  -       
    2000        primer_bind    F1ori-F             6261   6283  +       
    2100        primer_bind    pRS-marker          6436   6456  -       
    2200        primer_bind    pGEX 3'             6555   6578  +       
    2300        primer_bind    pBRforEco           6615   6634  -       
    2400        promoter       AmpR promoter       6701   6806  +       
    2500        CDS            AmpR                6806   7667  +       
    2600        primer_bind    Amp-R               7024   7044  -       
    2700        rep_origin     ori                 7837   8426  +       
    2800        primer_bind    pBR322ori-F         8326   8346  +       
    
    <queen.QUEEN object; project='pX330_26', length='8466 bp', topology='linear'>
    <queen.QUEEN object; project='EGFP_2', length='787 bp', topology='linear'>
    <queen.QUEEN object; project='pgRNA', length='8487 bp', topology='circular'>

    Example code 19: Flip ampicillin-resistant gene in pX330

  8. Search for the ampicillin-resistant gene in pX330,

  9. Cut pX330 with start and end positions of the ampicillin-resistant gene,

  10. Flip the ampicillin-resistant gene fragment, and

  11. Join it with the other fragment.  

    (Expected runtime: less than 1 sec.)

    Source code (continued from the previous code)

    site         = plasmid.searchfeature(query="^AmpR$")[0]
    fragments    = cutdna(plasmid, site.start, site.end)
    fragments[0] = flipdna(fragments[0])
    new_plasmid  = joindna(*fragments, topology="circular")
    plasmid.printfeature(plasmid.searchfeature(query="^AmpR$"))
    new_plasmid.printfeature(new_plasmid.searchfeature(query="^AmpR$"))  

    Output

    feature_id  feature_type  qualifiers:label  start  end   strand  
    2300        CDS           AmpR              6803   7664  +       
    
    feature_id  feature_type  qualifiers:label  start  end   strand  
    2400        CDS           AmpR              6803   7664  -  
    • editsequence(input=QUEEN object, source_sequence=regex or str, destination_sequence=str, start=int, end=int, strand=int)

    Edit sequence of QUEEN_object by searching target sequence fragments matched to a source_sequence and replacing each of them with a destination_sequence. All DNAfeature_objects located on the edited sequence regions will be given the "qualifier:broken-feature" attribute. In any sequence edit that confers change in the sequence length of the QUEEN object, the coordinates of all affected DNAfeature_objects will be adjusted. This is the parental function of searchsequence(). If destination_sequence is not provided, it works just as searchsequence().

    Parameters

    • input: QUEEN object

    • source_sequence: regex or str (default: ".+")
      Source sequence(s) to be replaced. If the value is not provided, the entire QUEEN_object sequence will be replaced with a destination_sequence. It allows fuzzy matching and regular expression. For details, see https://pypi.org/project/regex/. All IUPAC nucleotide symbols can also be used. Substrings of the regex value can be isolated by enclosing them in parentheses. Each pair of parentheses is indexed sequentially by numbers from left to right. Isolated substrings can be replaced at once by providing a destination_sequence where each substring replacement is designated, referring to the index numbers. For details, see https://docs.python.org/3/library/re.html#re.sub

    • destination_sequence: str (default: None)
      Destination sequence.

    • start:int (zero-based indexing; default: 0)
      Start position of the target range of the QUEEN_object sequence to be searched for the replacement.

    • end:int (zero-based indexing; default: the last sequence position of QUEEN_object)
      End position of the target range of the QUEEN_object sequence to be searched for the replacement.

    • strand: int: 1 (top strand only), -1 (bottom strand only), or 2 (both strands) (default: 2)
      Sequence strand to be searched for the replacement.

    Return

    If destination_sequence is not provided, it will act as searchsequence() and return a list of DNAFeature_objects. Otherwise, QUEEN_object .

Example code 20: Insert an EGFP sequence into pX330

An EGFP sequence insertion to the EcoRI site demonstrated in Example code17 can be described with a simpler code using editsequence().
(Expected runtime: less than 1 sec.)

Source code (continued from the previous code)

EGFP  = QUEEN(record="input/EGFP.fasta")
pEGFP = editsequence(plasmid, "({})".format(cutsite.lib["EcoRI"].seq), r"\1{}\1".format(EGFP.seq))
print(plasmid)
print(pEGFP)

Output

<queen.QUEEN object; project='pX330', length='8484 bp', topology='circular'>
<queen.QUEEN object; project='pX330', length='9267 bp', topology='linear'>

Example code 21: Insert a DNA string "AAAAA" to the 5’ end of every CDS

(Expected runtime: less than 1 sec.)

Source code (continued from the previous code)

new_plasmid = editfeature(plasmid, key_attribute="feature_type", query="CDS", 
                           strand=1, target_attribute="sequence", operation=replaceattribute(r"(.+)", r"AAAAA\1"))
for feat in new_plasmid.searchfeature(key_attribute="feature_type", query="CDS", strand=1):
    print(feat.start, feat.end, new_plasmid.printsequence(feat.start, feat.start+20, strand=1), feat.qualifiers["label"][0], sep="\t")

Output

1231    1302    AAAAAGACTATAAGGACCAC    3xFLAG
1308    1334    AAAAACCAAAGAAGAAGCGG    SV40 NLS
1358    5464    AAAAAGACAAGAAGTACAGC    Cas9
5464    5517    AAAAAAAAAGGCCGGCGGCC    nucleoplasmin NLS
6823    7689    AAAAAATGAGTATTCAACAT    AmpR

Example code 22: Convert the feature type of every annotation from "CDS" to "gene"

(Expected runtime: less than 1 sec.)

Source code (continued from the previous code)

new_plasmid = editfeature(plasmid, key_attribute="feature_type", query="CDS", 
target_attribute="feature_type", operation=replaceattribute("gene"))
new_plasmid.printfeature()

Output

feature_id  feature_type   qualifier:label     start  end   strand  
1           source         null                0      8484  +       
100         promoter       U6 promoter         0      241   +       
200         primer_bind    hU6-F               0      21    +       
300         primer_bind    LKO.1 5'            171    191   +       
400         misc_RNA       gRNA scaffold       267    343   +       
500         enhancer       CMV enhancer        439    725   +       
600         intron         hybrid intron       983    1211  +       
700         regulatory     null                1222   1232  +       
800         gene           3xFLAG              1231   1297  +       
900         gene           SV40 NLS            1303   1324  +       
1000        gene           Cas9                1348   5449  +       
1100        gene           nucleoplasmin NLS   5449   5497  +       
1200        primer_bind    BGH-rev             5524   5542  -       
1300        polyA_signal   bGH poly(A) signal  5530   5738  +       
1400        repeat_region  AAV2 ITR            5746   5887  +       
1500        repeat_region  AAV2 ITR            5746   5876  +      
1600        rep_origin     f1 ori              5961   6417  +       
1700        primer_bind    F1ori-R             6048   6068  -       
1800        primer_bind    F1ori-F             6258   6280  +       
1900        primer_bind    pRS-marker          6433   6453  -       
2000        primer_bind    pGEX 3'             6552   6575  +       
2100        primer_bind    pBRforEco           6612   6631  -       
2200        promoter       AmpR promoter       6698   6803  +       
2300        gene           AmpR                6803   7664  +       
2400        primer_bind    Amp-R               7021   7041  -       
2500        rep_origin     ori                 7834   8423  +       
2600        primer_bind    pBR322ori-F         8323   8343  +     

Example code 23: Add single cutter annotations to pX330

  1. Search for all of the single restriction enzyme cutters in pX330 using the library of restriction enzymes listed on the website of NEW England Biolabs.

  2. Add the single cutter annotations to pX330.

    (Expected runtime: less than 1 sec.)

    Source code (continued from the previous code)

    unique_cutters = []
    for key, re in cutsite.lib.items():
    sites = plasmid.searchsequence(re.cutsite)
    if len(sites) == 1: 
        unique_cutters.append(sites[0])
    else:
        pass
    new_plasmid = editfeature(plasmid, source=unique_cutters, target_attribute="feature_id", operation=createattribute("RE"))
    new_plasmid = editfeature(new_plasmid, key_attribute="feature_id", query="RE", target_attribute="feature_type", operation=replaceattribute("misc_bind"))
    features    = new_plasmid.searchfeature(key_attribute="feature_type", query="misc_bind")
    new_plasmid.printfeature(features, seq=True)

    Output

    RE-1        misc_bind     Acc65I           433    439   +       GGTACC        
    RE-2        misc_bind     AgeI             1216   1222  +       ACCGGT        
    RE-3        misc_bind     ApaI             2700   2706  +       GGGCCC        
    RE-4        misc_bind     BglII            1595   1601  +       AGATCT        
    RE-5        misc_bind     BsaBI            4839   4849  +       GATCACCATC    
    RE-6        misc_bind     BseRI            1098   1104  -       GAGGAG        
    RE-7        misc_bind     BsmI             4979   4985  +       GAATGC        
    RE-8        misc_bind     CspCI            4127   4139  +       CAAAGCACGTGG  
    RE-9        misc_bind     EcoRI            5500   5506  +       GAATTC        
    RE-10       misc_bind     EcoRV            3196   3202  +       GATATC        
    RE-11       misc_bind     FseI             5472   5480  +       GGCCGGCC      
    RE-12       misc_bind     FspI             7365   7371  +       TGCGCA        
    RE-13       misc_bind     KasI             5887   5893  +       GGCGCC        
    RE-16       misc_bind     NotI             5738   5746  +       GCGGCCGC      
    RE-17       misc_bind     PaqCI            1184   1191  +       CACCTGC       
    RE-19       misc_bind     PmlI             4132   4138  +       CACGTG        
    RE-20       misc_bind     PsiI             6317   6323  +       TTATAA        
    RE-22       misc_bind     PvuI             7218   7224  +       CGATCG        
    RE-23       misc_bind     SacII            7522   7528  +       CCGCGG        
    RE-24       misc_bind     SbfI             5879   5887  +       CCTGCAGG      
    RE-26       misc_bind     SnaBI            698    704   +       TACGTA        
    RE-27       misc_bind     XbaI             427    433   +       TCTAGA

Common parameters of the quinable functions

DNA construction process achieved by QUEEN() for genearating QUEEN object, the search functions searchsequence() and searchfeature(), operational functions cutdna(), cropdna(), modifyends(), flipdna(), and joindna() and super functions editsequence() and editfeature() described up to here can progressively be recorded into the manipulating QUEEN object, which enables to generate a quine code that replicates the same QUEEN object by the quine() function described below. From here, we call these functions "quinable" functions.

In addition to the parameters and options described above for the quinable functions, all of them can commonly take the five parameters.
The process_name, process_description, and product, that enable annotation and structured visualization of the construction process (see below). The three optional parameters do not affect the behavior of the quinable functions. Then, from ver 1.1, the additional two common parameters quianable and supfeature are added (see below)

Quine

quine (input=QUEEN_object, output=str, process_description=bool, execution=bool)

Generate "quine code" of QUEEN_object that produces the same QUEEN_object. A quine code can be executed as a Python script.

Parameters

Example code 24: Obtain the quine code reconstructed pCMV-Target-AID

The Target-AID plasmid (pCMV-Target-AID) was constructed by assembling two fragments encoding the N- and C-terminus halves of Target-AID, which were both amplified from pcDNA3.1_pCMV-nCas-PmCDA1-ugi pH1-gRNA(HPRT) (Addgene 79620) using primer pairs RS045/HM129 and HM128/RS046, respectively, with a backbone fragment amplified from pCMV-ABE7.10 using RS047/RS048. The construction process was simulated by using quinable functions, and the GenBank file was generated. The quine code generated from the GenBank file by quine() successfully reconstructed the same GenBank file. The Python scripts for the following Example codes 24-27 can be found in "./demo/tutorial_ex24-28.ipynb".
(Expected runtime: less than 1 sec.)

Source code

from QUEEN.queen import *
︙(ommitted)
pCMV_Target_AID = QUEEN(record="./output/pCMV-Target-AID.gbk")
quine(pCMV_Target_AID, output="./output/pCMV-Target-AID_clone.py")

Shell commands

%python3 ./output/pCMV_Target_AID_clone.py > ./output/clone_pCMV-Target-AID.gbk
%diff -s ./output/pCMV_Target_AID.gbk ./output/clone_pCMV_Target_AID.gbk

Output

Files ./output/clone_pCMV-Target-AID.gbk and ./output/pCMV-Target-AID.gbk are identical.

Example code 25: Inheritance of operational histories

If a QUEEN_object is loaded from a QUEEN-generated GenBank file for a new DNA construction, the quine code of the original QUEEN_object will be inherited into the newly producing QUEEN_object. The following example demonstrates that a QUEEN_object representing a DNA fragment cropped from the QUEEN_object of pCMV-Target-AID holds not only the process history of the cropping but also the whole previous construction process of pCMV-Target-AID.
(Expected runtime: less than 1 sec.)

Source code (continued from the previous code)

description = "Extract a fragment spanning from 8,000 nt to 2,000 nt of pCMV-Target-AID"
cropdna(pCMV_Target_AID, 8000, 2000, product="fragment", process_description=description)
quine(fragment) 

Output (quine code generated from the "fragment" product)

︙(ommitted)
description5 = 'Extract a fragment spanning from 8,000 nt to 2,000 nt of pCMV-Target-AID'
cropdna(QUEEN.dna_dict['pCMV-BE4max_8'], start='8000/8000', end='2000/2000', project='pCMV-BE4max', product='fragment', process_description=description5)

There is an option import_history=False prepared for QUEEN() to disable the inheritance of operational process histories of previously generated QUEEN_objects to a newly producing QUEEN_object.

NOTE 1: Editing of quine code

quine() will provide each quinable process in a quine code with a unique process identifier in the process_id option, like "process_id=QUEEN_object.project–XXXXXXXXXXXXXXXXXXXXXXXX", where "Xs" represents md5() transformation of the quinable operation excluding the process_id and original_ids (described below). This process_id serves as a checksum to validate if any modification is provided to the operation code. Therefore, when a new QUEEN script is created by editing a quine code generated from an existing QUEEN_object or combining different process parts from multiple quine codes, the newly generating QUEEN_object will hold these previous process_ids. These process_ids will be passed over to a list original_ids of the corresponding new operation when a new quine code is generated from the new QUEEN_object. Hence, editing histories of quine codes and their inheritances can also be tracked and stored in QUEEN_objects.

NOTE 2: Recording of variable names

By default, QUEEN cannot track and record user-defined variable names of QUEEN_objects used in the original code. Therefore, the .project value of each QUEEN_object is used as its variable name when a quine code is generated. To generate a quine code with user-defined variable names for each operational step, the QUEEN_object needs to be generated with the Python command set_namespace(globals()) executed. This enables providing variable names of producing objects as arguments of their operational functions and, therefore, the recovery of variable names in quine codes.

For example, Example code 19 can be written in this format as follows.

Example code 26: Flip ampicillin-resistant gene in pX330 (variable embedding)

(Expected runtime: less than 1 sec.)

Original code

import sys 
from QUEEN.queen import * 
set_namespace(globals())
QUEEN(record="input/pX330.gbk", product="plasmid")
plasmid.searchfeature(query="^AmpR$", product="sites")
cutdna(plasmid, sites[0].start, sites[0].end, product="fragments")
flipdna(fragments[0], product="fragments0_rc")
joindna(fragments0_rc, fragments[1], topology="circular", product="new_plasmid")

Quine code

import sys
sys.path.append("/content/colab")
from QUEEN.queen import *
from QUEEN import cutsite as cs
set_namespace(globals())
QUEEN(record='input/pX330.gbk', product='plasmid', process_id='new_plasmid-9WZX2KEVGD9NVBR4DSWNBCSND320', original_ids=[])
plasmid.searchfeature(key_attribute='all', query='^AmpR$', product='sites', process_id='new_plasmid-52D5THPBJ8G961IHBTPEGI4JH321', original_ids=[])
cutdna(plasmid, sites[0].start, sites[0].end, product='fragments', process_id='new_plasmid-2XUXT7UUAIIY5UFUMC8TBPOHC322', original_ids=[])
flipdna(fragments[0], product='fragments0_rc', process_id='new_plasmid-30A7468VURMAE1DIOMS3A7JJP324', original_ids=[])
joindna(*[fragments0_rc, fragments[1]], topology='circular', product='new_plasmid', process_id='new_plasmid-649L08K2L92IMQ1SY8NDKX76B325', original_ids=[])

Visualization

QUEEN provides the following visualization functions.

Return

graphviz.dot.Digraph object

Example code 28: Visualization of the flow chart for pCMV-Target-AID construction

(Expected runtime: less than 1 sec.)

Source code (continued from the example code 24)

graph_a = visualizeflow(pCMV_Target_AID, sf=False,ip=True, grouping=False)
graph_b = visualizeflow(pCMV_Target_AID, sf=True, ip=True, grouping=False)
graph_c = visualizeflow(pCMV_Target_AID, sf=True, ip=True, grouping=True, pd=True)
graph_d = visualizeflow(pCMV_Target_AID) #default setting
graph1.render("pCMV-Target-AID_flow1")
graph2.render("pCMV-Target-AID_flow2")
graph3.render("pCMV-Target-AID_flow3")
graph4.render("pCMV-Target-AID_flow4")

Output figures