rgtjf / DeepLearning

1 stars 0 forks source link

如何构建BaseFeature Class #7

Closed rgtjf closed 1 year ago

rgtjf commented 7 years ago

满足以下要求: 1、记录下特征,包括记录下原来的句子,及一些信息, 以csv记录 2、组合模式,能收集到所有使用的Feature名字

rgtjf commented 7 years ago

2、组合模式,能收集到所有使用的Feature名字 参考lasagne得到所有参数、得到所有名字

rgtjf commented 7 years ago

1、记录下特征,包括记录下原来的句子,及一些信息, 以csv记录

如何自动记录?

rgtjf commented 7 years ago

Ref:

def get_all_layers(layer, treat_as_input=None):
    """
    This function gathers all layers below one or more given :class:`Layer`
    instances, including the given layer(s). Its main use is to collect all
    layers of a network just given the output layer(s). The layers are
    guaranteed to be returned in a topological order: a layer in the result
    list is always preceded by all layers its input depends on.

    Parameters
    ----------
    layer : Layer or list
        the :class:`Layer` instance for which to gather all layers feeding
        into it, or a list of :class:`Layer` instances.

    treat_as_input : None or iterable
        an iterable of :class:`Layer` instances to treat as input layers
        with no layers feeding into them. They will show up in the result
        list, but their incoming layers will not be collected (unless they
        are required for other layers as well).

    Returns
    -------
    list
        a list of :class:`Layer` instances feeding into the given
        instance(s) either directly or indirectly, and the given
        instance(s) themselves, in topological order.

    Examples
    --------
    >>> from lasagne.layers import InputLayer, DenseLayer
    >>> l_in = InputLayer((100, 20))
    >>> l1 = DenseLayer(l_in, num_units=50)
    >>> get_all_layers(l1) == [l_in, l1]
    True
    >>> l2 = DenseLayer(l_in, num_units=10)
    >>> get_all_layers([l2, l1]) == [l_in, l2, l1]
    True
    >>> get_all_layers([l1, l2]) == [l_in, l1, l2]
    True
    >>> l3 = DenseLayer(l2, num_units=20)
    >>> get_all_layers(l3) == [l_in, l2, l3]
    True
    >>> get_all_layers(l3, treat_as_input=[l2]) == [l2, l3]
    True
    """
    # We perform a depth-first search. We add a layer to the result list only
    # after adding all its incoming layers (if any) or when detecting a cycle.
    # We use a LIFO stack to avoid ever running into recursion depth limits.
    try:
        queue = deque(layer)
    except TypeError:
        queue = deque([layer])
    seen = set()
    done = set()
    result = []

    # If treat_as_input is given, we pretend we've already collected all their
    # incoming layers.
    if treat_as_input is not None:
        seen.update(treat_as_input)

    while queue:
        # Peek at the leftmost node in the queue.
        layer = queue[0]
        if layer is None:
            # Some node had an input_layer set to `None`. Just ignore it.
            queue.popleft()
        elif layer not in seen:
            # We haven't seen this node yet: Mark it and queue all incomings
            # to be processed first. If there are no incomings, the node will
            # be appended to the result list in the next iteration.
            seen.add(layer)
            if hasattr(layer, 'input_layers'):
                queue.extendleft(reversed(layer.input_layers))
            elif hasattr(layer, 'input_layer'):
                queue.appendleft(layer.input_layer)
        else:
            # We've been here before: Either we've finished all its incomings,
            # or we've detected a cycle. In both cases, we remove the layer
            # from the queue and append it to the result list.
            queue.popleft()
            if layer not in done:
                result.append(layer)
                done.add(layer)

    return result
rgtjf commented 7 years ago

Ref:

def get_all_params(layer, **tags):

    layers = get_all_layers(layer)
    params = sum([l.get_params(**tags) for l in layers], [])
    return utils.unique(params)
rgtjf commented 7 years ago

laze operation

feature = Feature('name')
feature.add(new UniGramFeature('unigram'))
feature.add(new BiGramFeature('bigram', load=True))
feature.add(new TriGramFeature('trigram'))

feature.input = x, y
feature.extract()

MergeFeature([namelsit], name);

for name in feature.feature_names:
   print(name)
rgtjf commented 7 years ago

feature 与 class 结合

model = new Model('name', Classify)
#model.train()
#model.test()

model.add(new UniGramFeature('unigram'))
model.add(new BiGramFeature('bigram', load=True))
model.add(new TriGramFeature('trigram'))

emb_model = new Model('name', Classify)
emb_model.add(Fearures)
emb_model.add(Features)

model.add(emb_model)

model.train()
model.test()
rgtjf commented 7 years ago

如何创建dict?

  1. 根据数据集创建字典 train_set!, dev_set?需要抽dev_set的特征吗
  2. 根据外部数据创建字典 直接load字典
rgtjf commented 7 years ago

class dict_loader 支持-类单利模式 支持-函数单利模式

@singleton
class dict_loader(object):
    def __init__(self):
        self.stopwords = None

    def load_stopwords(self):
        if self.stopwords == None:
            ''' load stopwords from file '''
            fp = open('english.stopwords.txt', 'r')
            english_stopwords = [line.strip('\r\n ') for line in fp.readlines()]
            self.stopwords = stopwords
        return self.stopwords

# dict_loader().load_puntcs()
rgtjf commented 7 years ago

dict 支持 key <==> index的一一映射